You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Yangze Guo <ka...@gmail.com> on 2020/03/02 04:06:16 UTC

[DISCUSS] FLIP-108: Add GPU support in Flink

Hi everyone,

We would like to start a discussion thread on "FLIP-108: Add GPU
support in Flink"[1].

This FLIP mainly discusses the following issues:

- Enable user to configure how many GPUs in a task executor and
forward such requirements to the external resource managers (for
Kubernetes/Yarn/Mesos setups).
- Provide information of available GPU resources to operators.

Key changes proposed in the FLIP are as follows:

- Forward GPU resource requirements to Yarn/Kubernetes.
- Introduce GPUManager as one of the task manager services to discover
and expose GPU resource information to the context of functions.
- Introduce the default script for GPU discovery, in which we provide
the privilege mode to help user to achieve worker-level isolation in
standalone mode.

Please find more details in the FLIP wiki document [1]. Looking forward to
your feedbacks.

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink

Best,
Yangze Guo

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Isaac Godfried <is...@paddlesoft.net>.




---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org wrote ----


> > Can we somehow keep this out of the TaskManager services
> I fear that we could not. IMO, the GPUManager(or
> ExternalServicesManagers in future) is conceptually one of the task
> manager services, just like MemoryManager before 1.10.
> - It maintains/holds the GPU resource at TM level and all of the
> operators allocate the GPU resources from it. So, it should be
> exclusive to a single TaskExecutor.
> - We could add a collection called ExternalResourceManagers to hold
> all managers of other external resources in the future.
>

Can you help me understand why this needs the addition in TaskMagerServices
or in the RuntimeContext?
Are you worried about the case when multiple Task Executors run in the same
JVM? That's not common, but wouldn't it actually be good in that case to
share the GPU Manager, given that the GPU is shared?

Thanks,
Stephan

---------------------------


> What parts need information about this?
> In this FLIP, operators need the information. Thus, we expose GPU
> information to the RuntimeContext/FunctionContext. The slot profile is
> not aware of GPU resources as GPU is TM level resource now.
>
> > Can the GPU Manager be a "self contained" thing that simply takes the
> configuration, and then abstracts everything internally?
> Yes, we just pass the path/args of the discover script and how many
> GPUs per TM to it. It takes the responsibility to get the GPU
> information and expose them to the RuntimeContext/FunctionContext of
> Operators. Meanwhile, we'd better not allow operators to directly
> access GPUManager, it should get what they want from Context. We could
> then decouple the interface/implementation of GPUManager and Public
> API.
>
> Best,
> Yangze Guo
>
> On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <se...@apache.org> wrote:
> >
> > It sounds fine to initially start with GPU specific support and think
> about
> > generalizing this once we better understand the space.
> >
> > About the implementation suggested in FLIP-108:
> > - Can we somehow keep this out of the TaskManager services? Anything we
> > have to pull through all layers of the TM makes the TM components yet
> more
> > complex and harder to maintain.
> >
> > - What parts need information about this?
> > -> do the slot profiles need information about the GPU?
> > -> Can the GPU Manager be a "self contained" thing that simply takes
> > the configuration, and then abstracts everything internally? Operators
> can
> > access it via "GPUManager.get()" or so?
> >
> >
> >
> > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Thanks for all the feedbacks.
> > >
> > > @Becket
> > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > > Public API section.
> > >
> > >
> > > @Stephan @Becket
> > > Regarding the general extended resource mechanism, I second Xintong's
> > > suggestion.
> > > - It's better to leverage ResourceProfile and ResourceSpec after we
> > > supporting fine-grained GPU scheduling. As a first step proposal, I
> > > prefer to not include it in the scope of this FLIP.
> > > - Regarding the "Extended Resource Manager", if I understand
> > > correctly, it just a code refactoring atm, we could extract the
> > > open/close/allocateExtendResources of GPUManager to that interface. If
> > > that is the case, +1 to do it during implementation.
> > >
> > > @Xingbo
> > > As Xintong said, we looked into how Spark supports a general "Custom
> > > Resource Scheduling" before and decided to introduce a common resource
> > > configuration
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > to make it more extensible. I think the "resource" is a proper level
> > > to contain all the configs of extended resources.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <hx...@gmail.com>
> wrote:
> > > >
> > > > Thanks a lot for the FLIP, Yangze.
> > > >
> > > > There is no doubt that GPU resource management support will greatly
> > > > facilitate the development of AI-related applications by PyFlink
> users.
> > > >
> > > > I have only one comment about this wiki:
> > > >
> > > > Regarding the names of several GPU configurations, I think it is
> better
> > > to
> > > > delete the resource field makes it consistent with the names of other
> > > > resource-related configurations in TaskManagerOption.
> > > >
> > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > taskmanager.gpu.discovery-script.path
> > > >
> > > > Best,
> > > >
> > > > Xingbo
> > > >
> > > >
> > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三 上午10:39写道:
> > > >
> > > > > @Stephan, @Becket,
> > > > >
> > > > > Actually, Yangze, Yang and I also had an offline discussion about
> > > making
> > > > > the "GPU Support" as some general "Extended Resource Support". We
> > > believe
> > > > > supporting extended resources in a general mechanism is definitely
> a
> > > good
> > > > > and extensible way. The reason we propose this FLIP narrowing its
> scope
> > > > > down to GPU alone, is mainly for the concern on extra efforts and
> > > review
> > > > > capacity needed for a general mechanism.
> > > > >
> > > > > To come up with a well design on a general extended resource
> management
> > > > > mechanism, we would need to investigate more on how people use
> > > different
> > > > > kind of resources in practice. For GPU, we learnt such knowledge
> from
> > > the
> > > > > experts, Becket and his team members. But for FPGA, or other
> potential
> > > > > extended resources, we don't have such convenient information
> sources,
> > > > > making the investigation requires more efforts, which I tend to
> think
> > > is
> > > > > not necessary atm.
> > > > >
> > > > > On the other hand, we also looked into how Spark supports a general
> > > "Custom
> > > > > Resource Scheduling". Assuming we want to have a similar general
> > > extended
> > > > > resource mechanism in the future, we believe that the current GPU
> > > support
> > > > > design can be easily extended, in an incremental way without too
> many
> > > > > reworks.
> > > > >
> > > > > - The most important part is probably user interfaces. Spark
> offers
> > > > > configuration options to define the amount, discovery script and
> > > vendor
> > > > > (on
> > > > > k8s) in a per resource type bias [1], which is very similar to
> what
> > > we
> > > > > proposed in this FLIP. I think it's not necessary to expose
> config
> > > > > options
> > > > > in the general way atm, since we do not have supports for other
> > > resource
> > > > > types now. If later we decided to have per resource type config
> > > > > options, we
> > > > > can have backwards compatibility on the current proposed options
> > > with
> > > > > simple key mapping.
> > > > > - For the GPU Manager, if later needed we can change it to a
> > > "Extended
> > > > > Resource Manager" (or whatever it is called). That should be a
> pure
> > > > > component-internal refactoring.
> > > > > - For ResourceProfile and ResourceSpec, there are already
> fields for
> > > > > general extended resource. We can of course leverage them when
> > > > > supporting
> > > > > fine grained GPU scheduling. That is also not in the scope of
> this
> > > first
> > > > > step proposal, and would require FLIP-56 to be finished first.
> > > > >
> > > > > To summary up, I agree with Becket that have a separate FLIP for
> the
> > > > > general extended resource mechanism, and keep it in mind when
> > > discussing
> > > > > and implementing the current one.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > >
> > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > > > That's a good point, Stephan. It makes total sense to generalize
> the
> > > > > > resource management to support custom resources. Having that
> allows
> > > users
> > > > > > to add new resources by themselves. The general resource
> management
> > > may
> > > > > > involve two different aspects:
> > > > > >
> > > > > > 1. The custom resource type definition. It is supported by the
> > > extended
> > > > > > resources in ResourceProfile and ResourceSpec. This will likely
> cover
> > > > > > majority of the cases.
> > > > > >
> > > > > > 2. The custom resource allocation logic, i.e. how to assign the
> > > resources
> > > > > > to different tasks, operators, and so on. This may require two
> > > levels /
> > > > > > steps:
> > > > > > a. Subtask level - make sure the subtasks are put into
> suitable
> > > > > slots.
> > > > > > It is done by the global RM and is not customizable right now.
> > > > > > b. Operator level - map the exact resource to the operators
> in
> > > TM.
> > > > > e.g.
> > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> > > assuming
> > > > > > the global RM does not distinguish individual resources of the
> same
> > > type.
> > > > > > It is true for memory, but not for GPU.
> > > > > >
> > > > > > The GPU manager is designed to do 2.b here. So it should
> discover the
> > > > > > physical GPU information and bind/match them to each operators.
> > > Making
> > > > > this
> > > > > > general will fill in the missing piece to support custom resource
> > > type
> > > > > > definition. But I'd avoid calling it a "External Resource
> Manager" to
> > > > > avoid
> > > > > > confusion with RM, maybe something like "Operator Resource
> Assigner"
> > > > > would
> > > > > > be more accurate. So for each resource type users can have an
> > > optional
> > > > > > "Operator Resource Assigner" in the TM. For memory, users don't
> need
> > > > > this,
> > > > > > but for other extended resources, users may need that.
> > > > > >
> > > > > > Personally I think a pluggable "Operator Resource Assigner" is
> > > achievable
> > > > > > in this FLIP. But I am also OK with having that in a separate
> FLIP
> > > > > because
> > > > > > the interface between the "Operator Resource Assigner" and
> operator
> > > may
> > > > > > take a while to settle down if we want to make it generic. But I
> > > think
> > > > > our
> > > > > > implementation should take this future work into consideration so
> > > that we
> > > > > > don't need to break backwards compatibility once we have that.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Thank you for writing this FLIP.
> > > > > > >
> > > > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > > > scheduling
> > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > >
> > > > > > > One thought I had when reading the proposal is if it makes
> sense to
> > > > > look
> > > > > > at
> > > > > > > the "GPU Manager" as an "External Resource Manager", and GPU
> is one
> > > > > such
> > > > > > > resource.
> > > > > > > The way I understand the ResourceProfile and ResourceSpec,
> that is
> > > how
> > > > > it
> > > > > > > is done there.
> > > > > > > It has the advantage that it looks more extensible. Maybe
> there is
> > > a
> > > > > GPU
> > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> Resource, a
> > > > > Alibaba
> > > > > > > TPU Resource, etc.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> becket.qin@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the FLIP Yangze. GPU resource management support
> is a
> > > > > > > must-have
> > > > > > > > for machine learning use cases. Actually it is one of the
> mostly
> > > > > asked
> > > > > > > > question from the users who are interested in using Flink
> for ML.
> > > > > > > >
> > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > 1. The WebUI / REST API should probably also be mentioned in
> the
> > > > > public
> > > > > > > > interface section.
> > > > > > > > 2. Is the data structure that holds GPU info also a public
> API?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > tonysong820@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for drafting the FLIP and kicking off the
> discussion,
> > > > > Yangze.
> > > > > > > > >
> > > > > > > > > Big +1 for this feature. Supporting using of GPU in Flink
> is
> > > > > > > significant,
> > > > > > > > > especially for the ML scenarios.
> > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> > > think
> > > > > > it's a
> > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> karmagyz@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > We would like to start a discussion thread on "FLIP-108:
> Add
> > > GPU
> > > > > > > > > > support in Flink"[1].
> > > > > > > > > >
> > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > >
> > > > > > > > > > - Enable user to configure how many GPUs in a task
> executor
> > > and
> > > > > > > > > > forward such requirements to the external resource
> managers
> > > (for
> > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > - Provide information of available GPU resources to
> > > operators.
> > > > > > > > > >
> > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > >
> > > > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > > > - Introduce GPUManager as one of the task manager
> services to
> > > > > > > discover
> > > > > > > > > > and expose GPU resource information to the context of
> > > functions.
> > > > > > > > > > - Introduce the default script for GPU discovery, in
> which we
> > > > > > provide
> > > > > > > > > > the privilege mode to help user to achieve worker-level
> > > isolation
> > > > > > in
> > > > > > > > > > standalone mode.
> > > > > > > > > >
> > > > > > > > > > Please find more details in the FLIP wiki document [1].
> > > Looking
> > > > > > > forward
> > > > > > > > > to
> > > > > > > > > > your feedbacks.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Yangze Guo <ka...@gmail.com>.
Thank you all for your participation! I'll start voting for this FLIP.

Best,
Yangze Guo

On Wed, Apr 1, 2020 at 4:55 PM Stephan Ewen <se...@apache.org> wrote:
>
> Sounds good!
>
> On Tue, Mar 31, 2020 at 4:32 AM Yangze Guo <ka...@gmail.com> wrote:
>
> > Hi everyone,
> > I've updated the FLIP accordingly. The key change is replacing two
> > resource allocation interfaces to config options.
> >
> > If there are no further comments, I would like to start a voting
> > thread by tomorrow.
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Mar 30, 2020 at 9:15 PM Till Rohrmann <tr...@apache.org>
> > wrote:
> > >
> > > If there is no need for the ExternalResourceDriver on the RM side, then
> > it
> > > is always a good idea to keep it simple and don't introduce it. One can
> > > always change things once one realizes that there is a need for it.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Mon, Mar 30, 2020 at 12:00 PM Yangze Guo <ka...@gmail.com> wrote:
> > >
> > > > Hi @Till, @Xintong
> > > >
> > > > I think even without the credential concerns, replacing the interfaces
> > > > with configuration options is a good idea from my side.
> > > > - Currently, I don't see any external resource does not compatible
> > > > with this mechanism
> > > > - It reduces the burden of users to implement a plugin themselves.
> > > > WDYT?
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Mon, Mar 30, 2020 at 5:44 PM Xintong Song <to...@gmail.com>
> > > > wrote:
> > > > >
> > > > > I also agree that the pluggable ExternalResourceDriver should be
> > loaded
> > > > by
> > > > > the cluster class loader. Despite the plugin might be implemented by
> > > > users,
> > > > > external resources (as part of task executor resources) should be
> > cluster
> > > > > configurations, unlike job-level user codes such as UDFs, because the
> > > > task
> > > > > executors belongs to the cluster rather than jobs.
> > > > >
> > > > >
> > > > > IIUC, the concern Stephan raised is about the potential credential
> > > > problem
> > > > > when executing user codes on RM with cluster class loader. The
> > concern
> > > > > makes sense to me, and I think what Yangze suggested should be a good
> > > > > approach trying to prevent such credential problems. The only
> > purpose we
> > > > > tried to execute user codes (i.e.
> > getKubernetes/YarnExternalResource) on
> > > > RM
> > > > > was that, we need to set these key-value pairs to pod/container
> > requests.
> > > > > Replacing the interfaces getKubernetes/YarnExternalResource with
> > > > > configuration options
> > > > > 'external-resource.{resourceName}.yarn/kubernetes.key/amount',
> > > > > we can still fulfill that purpose, without the credential risks.
> > > > >
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Mar 30, 2020 at 5:17 PM Till Rohrmann <tr...@apache.org>
> > > > wrote:
> > > > >
> > > > > > At the moment the RM does not have a user code class loader and I
> > agree
> > > > > > with Stephan that it should stay like this. This, however, does not
> > > > mean
> > > > > > that we cannot support pluggable components in the RM. As long as
> > the
> > > > > > plugins are on the system's class path, it should be fine for the
> > RM to
> > > > > > load them. For example, we could add external resources via Flink's
> > > > plugin
> > > > > > mechanism or something similar.
> > > > > >
> > > > > > A very simple implementation of such an ExternalResourceDriver
> > could
> > > > be a
> > > > > > class which simply returns what is written in the flink-conf.yaml
> > > > under a
> > > > > > given key.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo <ka...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hi, Stephan,
> > > > > > >
> > > > > > > I see your concern and I totally agree with you.
> > > > > > >
> > > > > > > The interface on RM side is now `Map<String key, String/Long
> > value>
> > > > > > > getYarn/KubernetesExternalResource()`. The only valid
> > information RM
> > > > > > > get from it is the configuration key of that external resource in
> > > > > > > Yarn/K8s. The "String/Long value" would be the same as the
> > > > > > > external-resource.{resourceName}.amount.
> > > > > > > So, I think it makes sense to replace these two interfaces with
> > two
> > > > > > > configs, i.e.
> > external-resource.{resourceName}.yarn/kubernetes.key.
> > > > We
> > > > > > > may lose some extensibility, but AFAIK it could work with common
> > > > > > > external resources like GPU, FPGA. WDYT?
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen <se...@apache.org>
> > > > wrote:
> > > > > > > >
> > > > > > > > Maybe one final comment: It is probably not an issue, but let's
> > > > try and
> > > > > > > > keep user code (via user code classloader) out of the
> > > > ResourceManager,
> > > > > > if
> > > > > > > > possible.
> > > > > > > >
> > > > > > > > As background:
> > > > > > > >
> > > > > > > > There were thoughts in the past to support setups where the RM
> > > > must run
> > > > > > > > with "superuser" credentials, but we cannot run JM/TM with
> > these
> > > > > > > > credentials, as the user code might access them otherwise.
> > > > > > > > This is actually possible today, you can run the RM in a
> > different
> > > > JVM
> > > > > > or
> > > > > > > > in a different container, and give it more credentials than
> > JMs /
> > > > TMs.
> > > > > > > But
> > > > > > > > for this to be feasible, we cannot allow any user-defined code
> > to
> > > > be in
> > > > > > > the
> > > > > > > > JVM, because that instantaneously breaks the isolation of
> > > > credentials.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo <karmagyz@gmail.com
> > >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the feedback, @Till and @Xintong.
> > > > > > > > >
> > > > > > > > > Regarding separating the interface, I'm also +1 with it.
> > > > > > > > >
> > > > > > > > > Regarding the resource allocation interface, true, it's
> > > > dangerous to
> > > > > > > > > give much access to user codes. Changing the return type to
> > > > > > Map<String
> > > > > > > > > key, String/Long value> makes sense to me. AFAIK, it is
> > > > compatible
> > > > > > > > > with all the first-party supported resources for
> > > > Yarn/Kubernetes. It
> > > > > > > > > could also free us from the potential dependency issue as
> > well.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yangze Guo
> > > > > > > > >
> > > > > > > > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song <
> > > > tonysong820@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Thanks for updating the FLIP, Yangze.
> > > > > > > > > >
> > > > > > > > > > I agree with Till that we probably want to separate the
> > > > K8s/Yarn
> > > > > > > > > decorator
> > > > > > > > > > calls. Users can still configure one driver class, and we
> > can
> > > > use
> > > > > > > > > > `instanceof` to check whether the driver implemented
> > K8s/Yarn
> > > > > > > specific
> > > > > > > > > > interfaces.
> > > > > > > > > >
> > > > > > > > > > Moreover, I'm not sure about exposing entire
> > > > `ContainerRequest` /
> > > > > > > `Pod`
> > > > > > > > > > (`AbstractKubernetesStepDecorator` directly manipulates on
> > > > `Pod`)
> > > > > > to
> > > > > > > user
> > > > > > > > > > codes. It gives more access to user codes than needed for
> > > > defining
> > > > > > > > > external
> > > > > > > > > > resource, which might cause problems. Instead, I would
> > suggest
> > > > to
> > > > > > > have
> > > > > > > > > > interface like `Map<String key, String value>
> > > > > > > > > > getYarn/KubernetesExternalResource()` and assemble them
> > into
> > > > > > > > > > `ContainerRequest` / `Pod` in
> > Yarn/KubernetesResourceManager.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <
> > > > > > trohrmann@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi everyone,
> > > > > > > > > > >
> > > > > > > > > > > I'm a bit late to the party. I think the current proposal
> > > > looks
> > > > > > > good.
> > > > > > > > > > >
> > > > > > > > > > > Concerning the ExternalResourceDriver interface defined
> > in
> > > > the
> > > > > > FLIP
> > > > > > > > > [1], I
> > > > > > > > > > > would suggest to not include the decorator calls for
> > > > Kubernetes
> > > > > > and
> > > > > > > > > Yarn in
> > > > > > > > > > > the base interface. Instead I would suggest to segregate
> > the
> > > > > > > deployment
> > > > > > > > > > > specific decorator calls into separate interfaces. That
> > way
> > > > an
> > > > > > > > > > > ExternalResourceDriver does not have to support all
> > > > deployments
> > > > > > > from
> > > > > > > > > the
> > > > > > > > > > > very beginning. Moreover, some resources might not be
> > > > supported
> > > > > > by
> > > > > > > a
> > > > > > > > > > > specific deployment target and the natural way to express
> > > > this
> > > > > > > would
> > > > > > > > > be to
> > > > > > > > > > > not implement the respective deployment specific
> > interface.
> > > > > > > > > > >
> > > > > > > > > > > Moreover, having void
> > > > > > > > > > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> > > > > > > > > containerRequest)
> > > > > > > > > > > in the ExternalResourceDriver interface would require
> > Hadoop
> > > > on
> > > > > > > Flink's
> > > > > > > > > > > classpath whenever the external resource driver is being
> > > > used.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Till
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <
> > > > sewen@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Nice, thanks a lot!
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <
> > > > > > karmagyz@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the suggestion, @Stephan, @Becket and
> > > > @Xintong.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've updated the FLIP accordingly. I do not add a
> > > > > > > > > > > > > ResourceInfoProvider. Instead, I introduce the
> > > > > > > > > ExternalResourceDriver,
> > > > > > > > > > > > > which takes the responsibility of all relevant
> > > > operations on
> > > > > > > both
> > > > > > > > > RM
> > > > > > > > > > > > > and TM sides.
> > > > > > > > > > > > > After a rethink about decoupling the management of
> > > > external
> > > > > > > > > resources
> > > > > > > > > > > > > from TaskExecutor, I think we could do the same
> > thing on
> > > > the
> > > > > > > > > > > > > ResourceManager side. We do not need to add a
> > specific
> > > > > > > allocation
> > > > > > > > > > > > > logic to the ResourceManager each time we add a
> > specific
> > > > > > > external
> > > > > > > > > > > > > resource.
> > > > > > > > > > > > > - For Yarn, we need the ExternalResourceDriver to
> > edit
> > > > the
> > > > > > > > > > > > > containerRequest.
> > > > > > > > > > > > > - For Kubenetes, ExternalResourceDriver could
> > provide a
> > > > > > > decorator
> > > > > > > > > for
> > > > > > > > > > > > > the TM pod.
> > > > > > > > > > > > >
> > > > > > > > > > > > > In this way, just like MetricReporter, we allow
> > users to
> > > > > > define
> > > > > > > > > their
> > > > > > > > > > > > > custom ExternalResourceDriver. It is more extensible
> > and
> > > > fits
> > > > > > > the
> > > > > > > > > > > > > separation of concerns. For more details, please
> > take a
> > > > look
> > > > > > at
> > > > > > > > > [1].
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <
> > > > > > sewen@apache.org
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This sounds good to go ahead from my side.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I like the approach that Becket suggested - in that
> > > > case
> > > > > > the
> > > > > > > core
> > > > > > > > > > > > > > abstraction that everyone would need to understand
> > > > would be
> > > > > > > > > "external
> > > > > > > > > > > > > > resource allocation" and the
> > "ResourceInfoProvider",
> > > > and
> > > > > > the
> > > > > > > GPU
> > > > > > > > > > > > specific
> > > > > > > > > > > > > > code would be a specific implementation only known
> > to
> > > > that
> > > > > > > > > component
> > > > > > > > > > > > that
> > > > > > > > > > > > > > allocates the external resource. That fits the
> > > > separation
> > > > > > of
> > > > > > > > > concerns
> > > > > > > > > > > > > well.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I also understand that it should not be
> > > > over-engineered in
> > > > > > > the
> > > > > > > > > first
> > > > > > > > > > > > > > version, so some simplification makes sense, and
> > then
> > > > > > > gradually
> > > > > > > > > > > expand
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > there.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So +1 to go ahead with what was suggested above
> > > > (Xintong /
> > > > > > > > > Becket)
> > > > > > > > > > > from
> > > > > > > > > > > > > my
> > > > > > > > > > > > > > side.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <
> > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the comments, Stephan & Becket.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Stephan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I see your concern, and I completely agree with
> > you
> > > > that
> > > > > > we
> > > > > > > > > should
> > > > > > > > > > > > > first
> > > > > > > > > > > > > > > think about the "library" / "plugin" /
> > "extension"
> > > > style
> > > > > > if
> > > > > > > > > > > possible.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If GPUs are sliced and assigned during
> > scheduling,
> > > > there
> > > > > > > may be
> > > > > > > > > > > > reason,
> > > > > > > > > > > > > > > > although it looks that it would belong to the
> > slot
> > > > > > then.
> > > > > > > Is
> > > > > > > > > that
> > > > > > > > > > > > > what we
> > > > > > > > > > > > > > > > are doing here?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In the current proposal, we do not have the GPUs
> > > > sliced
> > > > > > and
> > > > > > > > > > > assigned
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > slots, because it could be problematic without
> > > > dynamic
> > > > > > slot
> > > > > > > > > > > > allocation.
> > > > > > > > > > > > > > > E.g., the number of GPUs might not be evenly
> > > > divisible by
> > > > > > > the
> > > > > > > > > > > number
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think it makes sense to eventually have the
> > GPUs
> > > > > > > assigned to
> > > > > > > > > > > slots.
> > > > > > > > > > > > > Even
> > > > > > > > > > > > > > > then, we might still need a TM level GPUManager
> > (or
> > > > > > > > > > > ResourceProvider
> > > > > > > > > > > > > like
> > > > > > > > > > > > > > > Becket suggested). For memory, in each slot we
> > can
> > > > simply
> > > > > > > > > request
> > > > > > > > > > > the
> > > > > > > > > > > > > > > amount of memory, leaving it to JVM / OS to
> > decide
> > > > which
> > > > > > > memory
> > > > > > > > > > > > > (address)
> > > > > > > > > > > > > > > should be assigned. For GPU, and potentially
> > other
> > > > > > > resources
> > > > > > > > > like
> > > > > > > > > > > > > FPGA, we
> > > > > > > > > > > > > > > need to explicitly specify which GPU (index)
> > should
> > > > be
> > > > > > > used.
> > > > > > > > > > > > > Therefore, we
> > > > > > > > > > > > > > > need some component at the TM level to coordinate
> > > > which
> > > > > > > slot
> > > > > > > > > uses
> > > > > > > > > > > > which
> > > > > > > > > > > > > > > GPU.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > IMO, unless we say Flink will not support
> > slot-level
> > > > GPU
> > > > > > > > > slicing at
> > > > > > > > > > > > > least
> > > > > > > > > > > > > > > in the foreseeable future, I don't see a good
> > way to
> > > > > > avoid
> > > > > > > > > touching
> > > > > > > > > > > > > the TM
> > > > > > > > > > > > > > > core. To that end, I think Becket's suggestion
> > > > points to
> > > > > > a
> > > > > > > good
> > > > > > > > > > > > > direction,
> > > > > > > > > > > > > > > that supports more features (GPU, FPGA, etc.)
> > with
> > > > less
> > > > > > > > > coupling to
> > > > > > > > > > > > > the TM
> > > > > > > > > > > > > > > core (only needs to understand the general
> > > > interfaces).
> > > > > > The
> > > > > > > > > > > detailed
> > > > > > > > > > > > > > > implementation for specific resource types can
> > even
> > > > be
> > > > > > > > > encapsulated
> > > > > > > > > > > > as
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > library.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for sharing your thought on the final
> > state.
> > > > > > > Despite the
> > > > > > > > > > > > > details how
> > > > > > > > > > > > > > > the interfaces should look like, I think this is
> > a
> > > > really
> > > > > > > good
> > > > > > > > > > > > > abstraction
> > > > > > > > > > > > > > > for supporting general resource types.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'd like to further clarify that, the following
> > three
> > > > > > > things
> > > > > > > > > are
> > > > > > > > > > > all
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > the "Flink core" needs to understand.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >    - The *amount* of resource, for scheduling.
> > > > Actually,
> > > > > > we
> > > > > > > > > already
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > >    the Resource class in ResourceProfile and
> > > > ResourceSpec
> > > > > > > for
> > > > > > > > > > > > extended
> > > > > > > > > > > > > > >    resource. It's just not really used.
> > > > > > > > > > > > > > >    - The *info*, that Flink provides to the
> > > > operators /
> > > > > > > user
> > > > > > > > > codes.
> > > > > > > > > > > > > > >    - The *provider*, which generates the info
> > based
> > > > on
> > > > > > the
> > > > > > > > > amount.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The "core" does not need to understand the
> > specific
> > > > > > > > > implementation
> > > > > > > > > > > > > details
> > > > > > > > > > > > > > > of the above three. They can even be implemented
> > in a
> > > > > > > 3rd-party
> > > > > > > > > > > > > library.
> > > > > > > > > > > > > > > Similar to how we allow users to define their
> > custom
> > > > > > > > > > > MetricReporter.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <
> > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the comment, Stephan.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >   - If everything becomes a "core feature", it
> > will
> > > > > > make
> > > > > > > the
> > > > > > > > > > > > project
> > > > > > > > > > > > > hard
> > > > > > > > > > > > > > > > > to develop in the future. Thinking "library"
> > /
> > > > > > > "plugin" /
> > > > > > > > > > > > > "extension"
> > > > > > > > > > > > > > > > style
> > > > > > > > > > > > > > > > > where possible helps.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Completely agree. It is much more important to
> > > > design a
> > > > > > > > > mechanism
> > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > focusing on a specific case. Here is what I am
> > > > thinking
> > > > > > > to
> > > > > > > > > fully
> > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > custom resource management:
> > > > > > > > > > > > > > > > 1. On the JM / RM side, use ResourceProfile and
> > > > > > > ResourceSpec
> > > > > > > > > to
> > > > > > > > > > > > > define
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > resource and the amount required. They will be
> > > > used to
> > > > > > > find
> > > > > > > > > > > > suitable
> > > > > > > > > > > > > TMs
> > > > > > > > > > > > > > > > slots to run the tasks. At this point, the
> > > > resources
> > > > > > are
> > > > > > > only
> > > > > > > > > > > > > measured by
> > > > > > > > > > > > > > > > amount, i.e. they do not have individual ID.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. On the TM side, have something like
> > > > > > > > > *"ResourceInfoProvider"*
> > > > > > > > > > > to
> > > > > > > > > > > > > > > identify
> > > > > > > > > > > > > > > > and provides the detail information of the
> > > > individual
> > > > > > > > > resource,
> > > > > > > > > > > > e.g.
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > ID.. It is important because the operator may
> > have
> > > > to
> > > > > > > > > explicitly
> > > > > > > > > > > > > interact
> > > > > > > > > > > > > > > > with the physical resource it uses. The
> > > > > > > ResourceInfoProvider
> > > > > > > > > > > might
> > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > like something below.
> > > > > > > > > > > > > > > > interface ResourceInfoProvider<INFO> {
> > > > > > > > > > > > > > > >     Map<AbstractID, INFO>
> > > > > > retrieveResourceInfo(OperatorId
> > > > > > > > > opId,
> > > > > > > > > > > > > > > > ResourceProfile resourceProfile);
> > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - There could be several
> > "*ResourceInfoProvider*"
> > > > > > > configured
> > > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > > > TM to
> > > > > > > > > > > > > > > > retrieve the information for different
> > resources.
> > > > > > > > > > > > > > > > - The TM will be responsible to assign those
> > > > individual
> > > > > > > > > resources
> > > > > > > > > > > > to
> > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > operator according to their requested amount.
> > > > > > > > > > > > > > > > - The operators will be able to get the
> > > > ResourceInfo
> > > > > > from
> > > > > > > > > their
> > > > > > > > > > > > > > > > RuntimeContext.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If we agree this is a reasonable final state.
> > We
> > > > can
> > > > > > > adapt
> > > > > > > > > the
> > > > > > > > > > > > > current
> > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > to it. In fact it does not sound a big change
> > to
> > > > me.
> > > > > > All
> > > > > > > the
> > > > > > > > > > > > proposed
> > > > > > > > > > > > > > > > configuration can be as is, it is just that
> > Flink
> > > > > > itself
> > > > > > > > > won't
> > > > > > > > > > > care
> > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > them, instead a GPUInfoProviver implementing
> > the
> > > > > > > > > > > > ResourceInfoProvider
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > use them.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <
> > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi all!
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The main point I wanted to throw into the
> > > > discussion
> > > > > > > is the
> > > > > > > > > > > > > following:
> > > > > > > > > > > > > > > > >   - With more and more use cases, more and
> > more
> > > > tools
> > > > > > > go
> > > > > > > > > into
> > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > >   - If everything becomes a "core feature",
> > it
> > > > will
> > > > > > > make
> > > > > > > > > the
> > > > > > > > > > > > > project
> > > > > > > > > > > > > > > hard
> > > > > > > > > > > > > > > > > to develop in the future. Thinking "library"
> > /
> > > > > > > "plugin" /
> > > > > > > > > > > > > "extension"
> > > > > > > > > > > > > > > > style
> > > > > > > > > > > > > > > > > where possible helps.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >   - A good thought experiment is always: How
> > many
> > > > > > > future
> > > > > > > > > > > > developers
> > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > interact with this code (and possibly
> > understand
> > > > it
> > > > > > > > > partially),
> > > > > > > > > > > > > even if
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > features they touch have nothing to do with
> > GPU
> > > > > > > support. If
> > > > > > > > > > > many
> > > > > > > > > > > > > > > > > contributors to unrelated features will have
> > to
> > > > touch
> > > > > > > it
> > > > > > > > > and
> > > > > > > > > > > > > understand
> > > > > > > > > > > > > > > > it,
> > > > > > > > > > > > > > > > > then let's think if there is a different
> > > > solution.
> > > > > > > Maybe
> > > > > > > > > there
> > > > > > > > > > > is
> > > > > > > > > > > > > not,
> > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > then we should be sure why.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >   - That led me to raising this issue: If
> > the GPU
> > > > > > > manager
> > > > > > > > > > > > becomes a
> > > > > > > > > > > > > > > core
> > > > > > > > > > > > > > > > > service in the TaskManager, Environment,
> > > > > > > RuntimeContext,
> > > > > > > > > etc.
> > > > > > > > > > > > then
> > > > > > > > > > > > > > > > everyone
> > > > > > > > > > > > > > > > > developing TM and streaming tasks need to
> > > > understand
> > > > > > > the
> > > > > > > > > GPU
> > > > > > > > > > > > > manager.
> > > > > > > > > > > > > > > > That
> > > > > > > > > > > > > > > > > seems oddly specific, is my impression.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Access to configuration seems not the right
> > > > reason to
> > > > > > > do
> > > > > > > > > that.
> > > > > > > > > > > We
> > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > expose the Flink configuration from the
> > > > > > RuntimeContext
> > > > > > > > > anyways.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > If GPUs are sliced and assigned during
> > > > scheduling,
> > > > > > > there
> > > > > > > > > may be
> > > > > > > > > > > > > reason,
> > > > > > > > > > > > > > > > > although it looks that it would belong to the
> > > > slot
> > > > > > > then. Is
> > > > > > > > > > > that
> > > > > > > > > > > > > what
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > are doing here?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song
> > <
> > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > IMO, eventually an operator should only see
> > > > info of
> > > > > > > GPUs
> > > > > > > > > that
> > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > dedicated
> > > > > > > > > > > > > > > > > > for it, instead of all GPUs on the
> > > > > > machine/container
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > current
> > > > > > > > > > > > > > > > > design.
> > > > > > > > > > > > > > > > > > It does not make sense to let the user who
> > > > writes a
> > > > > > > UDF
> > > > > > > > > to
> > > > > > > > > > > > worry
> > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > coordination among multiple operators
> > running
> > > > on
> > > > > > the
> > > > > > > same
> > > > > > > > > > > > > machine.
> > > > > > > > > > > > > > > And
> > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > we want to limit the GPU info an operator
> > > > sees, we
> > > > > > > > > should not
> > > > > > > > > > > > > let the
> > > > > > > > > > > > > > > > > > operator to instantiate GPUManager, which
> > > > means we
> > > > > > > have
> > > > > > > > > to
> > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > something
> > > > > > > > > > > > > > > > > > through runtime context, either GPU info or
> > > > some
> > > > > > > kind of
> > > > > > > > > > > > limited
> > > > > > > > > > > > > > > access
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > the GPUManager.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin
> > <
> > > > > > > > > > > > becket.qin@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > It probably make sense for us to first
> > agree
> > > > on
> > > > > > the
> > > > > > > > > final
> > > > > > > > > > > > > state.
> > > > > > > > > > > > > > > More
> > > > > > > > > > > > > > > > > > > specifically, will the resource info be
> > > > exposed
> > > > > > > through
> > > > > > > > > > > > runtime
> > > > > > > > > > > > > > > > context
> > > > > > > > > > > > > > > > > > > eventually?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > If that is the final state and we have a
> > > > seamless
> > > > > > > > > migration
> > > > > > > > > > > > > story
> > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > FLIP to that final state, Personally I
> > think
> > > > it
> > > > > > is
> > > > > > > OK
> > > > > > > > > to
> > > > > > > > > > > > > expose the
> > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > info in the runtime context.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong
> > > > Song <
> > > > > > > > > > > > > > > tonysong820@gmail.com
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > @Yangze,
> > > > > > > > > > > > > > > > > > > > I think what Stephan means (@Stephan,
> > > > please
> > > > > > > correct
> > > > > > > > > me
> > > > > > > > > > > if
> > > > > > > > > > > > > I'm
> > > > > > > > > > > > > > > > wrong)
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > that, we might not need to hold and
> > > > maintain
> > > > > > the
> > > > > > > > > > > GPUManager
> > > > > > > > > > > > > as a
> > > > > > > > > > > > > > > > > > service
> > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > TaskManagerServices or RuntimeContext.
> > An
> > > > > > > > > alternative is
> > > > > > > > > > > to
> > > > > > > > > > > > > > > create
> > > > > > > > > > > > > > > > /
> > > > > > > > > > > > > > > > > > > > retrieve the GPUManager only in the
> > > > operators
> > > > > > > that
> > > > > > > > > need
> > > > > > > > > > > it,
> > > > > > > > > > > > > e.g.,
> > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > @Stephan,
> > > > > > > > > > > > > > > > > > > > I agree with you on excluding
> > GPUManager
> > > > from
> > > > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >    - For the first step, where we
> > provide
> > > > > > unified
> > > > > > > > > > > TM-level
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > > > > >    to all operators, it should be fine
> > to
> > > > have
> > > > > > > > > operators
> > > > > > > > > > > > > access /
> > > > > > > > > > > > > > > > > > > >    lazy-initiate GPUManager by
> > themselves.
> > > > > > > > > > > > > > > > > > > >    - In future, we might have some more
> > > > > > > fine-grained
> > > > > > > > > GPU
> > > > > > > > > > > > > > > > management,
> > > > > > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > > > >    we need to maintain GPUManager as a
> > > > service
> > > > > > > and
> > > > > > > > > put
> > > > > > > > > > > GPU
> > > > > > > > > > > > > info
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > slot
> > > > > > > > > > > > > > > > > > > >    profiles. But at least for now it's
> > not
> > > > > > > necessary
> > > > > > > > > to
> > > > > > > > > > > > > introduce
> > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > >    complexity.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > However, I have some concerns on
> > excluding
> > > > > > > GPUManager
> > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > RuntimeContext
> > > > > > > > > > > > > > > > > > > > and let operators access it directly.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >    - Configurations needed for
> > creating the
> > > > > > > > > GPUManager is
> > > > > > > > > > > > not
> > > > > > > > > > > > > > > > always
> > > > > > > > > > > > > > > > > > > >    available for operators.
> > > > > > > > > > > > > > > > > > > >    - If later we want to have
> > fine-grained
> > > > > > > control
> > > > > > > > > over
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > (e.g.,
> > > > > > > > > > > > > > > > > > > >    operators in each slot can only see
> > GPUs
> > > > > > > reserved
> > > > > > > > > for
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > > slot),
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I would suggest to wrap the GPUManager
> > > > behind
> > > > > > > > > > > > RuntimeContext
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > > > > expose the GPUInfo to users. For now,
> > we
> > > > can
> > > > > > > declare
> > > > > > > > > a
> > > > > > > > > > > > method
> > > > > > > > > > > > > > > > > > > > `getGPUInfo()` in RuntimeContext, with
> > a
> > > > > > default
> > > > > > > > > > > definition
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > calls
> > > > > > > > > > > > > > > > > > > > `GPUManager.get()` to get the
> > > > lazily-created
> > > > > > > > > GPUManager.
> > > > > > > > > > > If
> > > > > > > > > > > > > later
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > want
> > > > > > > > > > > > > > > > > > > > to create / retrieve GPUManager in a
> > > > different
> > > > > > > way,
> > > > > > > > > we
> > > > > > > > > > > can
> > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > > > > > how `getGPUInfo` is implemented,
> > without
> > > > > > needing
> > > > > > > to
> > > > > > > > > > > change
> > > > > > > > > > > > > any
> > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze
> > > > Guo <
> > > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > @Shephan
> > > > > > > > > > > > > > > > > > > > > Do you mean Minicluster? Yes, it
> > makes
> > > > sense
> > > > > > to
> > > > > > > > > share
> > > > > > > > > > > the
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > Manager
> > > > > > > > > > > > > > > > > > > > > in such scenario.
> > > > > > > > > > > > > > > > > > > > > If that's what you worry about, I'm
> > +1
> > > > for
> > > > > > > holding
> > > > > > > > > > > > > > > > > > > > > GPUManager(ExternalResourceManagers)
> > in
> > > > > > > > > TaskExecutor
> > > > > > > > > > > > > instead of
> > > > > > > > > > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Regarding the
> > > > RuntimeContext/FunctionContext,
> > > > > > > it
> > > > > > > > > just
> > > > > > > > > > > > > holds the
> > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > info instead of the GPU Manager.
> > AFAIK,
> > > > it's
> > > > > > > the
> > > > > > > > > only
> > > > > > > > > > > > > place we
> > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > > pass GPU info to the
> > > > > > > > > RichFunction/UserDefinedFunction.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac
> > > > > > Godfried
> > > > > > > <
> > > > > > > > > > > > > > > > > isaac@paddlesoft.net
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20
> > +0000
> > > > > > > > > > > > sewen@apache.org
> > > > > > > > > > > > > > > wrote
> > > > > > > > > > > > > > > > > > ----
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Can we somehow keep this out
> > of the
> > > > > > > > > TaskManager
> > > > > > > > > > > > > services
> > > > > > > > > > > > > > > > > > > > > > > I fear that we could not. IMO,
> > the
> > > > > > > > > GPUManager(or
> > > > > > > > > > > > > > > > > > > > > > > ExternalServicesManagers in
> > future)
> > > > is
> > > > > > > > > conceptually
> > > > > > > > > > > > > one of
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > > > > > > manager services, just like
> > > > MemoryManager
> > > > > > > > > before
> > > > > > > > > > > > 1.10.
> > > > > > > > > > > > > > > > > > > > > > > - It maintains/holds the GPU
> > > > resource at
> > > > > > TM
> > > > > > > > > level
> > > > > > > > > > > and
> > > > > > > > > > > > > all
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > operators allocate the GPU
> > resources
> > > > from
> > > > > > > it.
> > > > > > > > > So,
> > > > > > > > > > > it
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > exclusive to a single
> > TaskExecutor.
> > > > > > > > > > > > > > > > > > > > > > > - We could add a collection
> > called
> > > > > > > > > > > > > ExternalResourceManagers
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > hold
> > > > > > > > > > > > > > > > > > > > > > > all managers of other external
> > > > resources
> > > > > > > in the
> > > > > > > > > > > > future.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Can you help me understand why this
> > > > needs
> > > > > > the
> > > > > > > > > > > addition
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > > > > > > > > > > > Are you worried about the case when
> > > > > > multiple
> > > > > > > Task
> > > > > > > > > > > > > Executors
> > > > > > > > > > > > > > > run
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > JVM? That's not common, but
> > wouldn't it
> > > > > > > actually
> > > > > > > > > be
> > > > > > > > > > > > good
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > case
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > share the GPU Manager, given that
> > the
> > > > GPU
> > > > > > is
> > > > > > > > > shared?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > ---------------------------
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > What parts need information about
> > > > this?
> > > > > > > > > > > > > > > > > > > > > > > In this FLIP, operators need the
> > > > > > > information.
> > > > > > > > > Thus,
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > information to the
> > > > > > > > > RuntimeContext/FunctionContext.
> > > > > > > > > > > > The
> > > > > > > > > > > > > slot
> > > > > > > > > > > > > > > > > > profile
> > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > not aware of GPU resources as
> > GPU is
> > > > TM
> > > > > > > level
> > > > > > > > > > > > resource
> > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Can the GPU Manager be a "self
> > > > > > contained"
> > > > > > > > > thing
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > configuration, and then abstracts
> > > > > > > everything
> > > > > > > > > > > > > internally?
> > > > > > > > > > > > > > > > > > > > > > > Yes, we just pass the path/args
> > of
> > > > the
> > > > > > > discover
> > > > > > > > > > > > script
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > > > > GPUs per TM to it. It takes the
> > > > > > > responsibility
> > > > > > > > > to
> > > > > > > > > > > get
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > information and expose them to
> > the
> > > > > > > > > > > > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > Operators. Meanwhile, we'd
> > better not
> > > > > > allow
> > > > > > > > > > > operators
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > directly
> > > > > > > > > > > > > > > > > > > > > > > access GPUManager, it should get
> > what
> > > > > > they
> > > > > > > want
> > > > > > > > > > > from
> > > > > > > > > > > > > > > Context.
> > > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > > > > then decouple the
> > > > > > interface/implementation
> > > > > > > of
> > > > > > > > > > > > > GPUManager
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > Public
> > > > > > > > > > > > > > > > > > > > > > > API.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM
> > > > Stephan
> > > > > > > Ewen <
> > > > > > > > > > > > > > > > sewen@apache.org
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > It sounds fine to initially
> > start
> > > > with
> > > > > > > GPU
> > > > > > > > > > > specific
> > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > > > > generalizing this once we
> > better
> > > > > > > understand
> > > > > > > > > the
> > > > > > > > > > > > > space.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > About the implementation
> > suggested
> > > > in
> > > > > > > > > FLIP-108:
> > > > > > > > > > > > > > > > > > > > > > > > - Can we somehow keep this out
> > of
> > > > the
> > > > > > > > > TaskManager
> > > > > > > > > > > > > > > services?
> > > > > > > > > > > > > > > > > > > > Anything
> > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > > have to pull through all
> > layers of
> > > > the
> > > > > > TM
> > > > > > > > > makes
> > > > > > > > > > > the
> > > > > > > > > > > > > TM
> > > > > > > > > > > > > > > > > > components
> > > > > > > > > > > > > > > > > > > > yet
> > > > > > > > > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > - What parts need information
> > about
> > > > > > this?
> > > > > > > > > > > > > > > > > > > > > > > > -> do the slot profiles need
> > > > > > information
> > > > > > > > > about
> > > > > > > > > > > the
> > > > > > > > > > > > > GPU?
> > > > > > > > > > > > > > > > > > > > > > > > -> Can the GPU Manager be a
> > "self
> > > > > > > contained"
> > > > > > > > > > > thing
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > > > > > > > the configuration, and then
> > > > abstracts
> > > > > > > > > everything
> > > > > > > > > > > > > > > > internally?
> > > > > > > > > > > > > > > > > > > > > Operators
> > > > > > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > > > access it via
> > "GPUManager.get()"
> > > > or so?
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM
> > > > Yangze
> > > > > > > Guo <
> > > > > > > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > > > > > > > > > > Regarding the WebUI and
> > GPUInfo,
> > > > > > you're
> > > > > > > > > right,
> > > > > > > > > > > > > I'll add
> > > > > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > > > > > > > > > > > Regarding the general
> > extended
> > > > > > resource
> > > > > > > > > > > > mechanism,
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > second
> > > > > > > > > > > > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > > > > > > > > > > > - It's better to leverage
> > > > > > > ResourceProfile
> > > > > > > > > and
> > > > > > > > > > > > > > > > ResourceSpec
> > > > > > > > > > > > > > > > > > > after
> > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > > > supporting fine-grained GPU
> > > > > > > scheduling. As
> > > > > > > > > a
> > > > > > > > > > > > first
> > > > > > > > > > > > > step
> > > > > > > > > > > > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > > > > > > > > > > > prefer to not include it in
> > the
> > > > scope
> > > > > > > of
> > > > > > > > > this
> > > > > > > > > > > > FLIP.
> > > > > > > > > > > > > > > > > > > > > > > > > - Regarding the "Extended
> > > > Resource
> > > > > > > > > Manager",
> > > > > > > > > > > if I
> > > > > > > > > > > > > > > > > understand
> > > > > > > > > > > > > > > > > > > > > > > > > correctly, it just a code
> > > > refactoring
> > > > > > > atm,
> > > > > > > > > we
> > > > > > > > > > > > could
> > > > > > > > > > > > > > > > extract
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > open/close/allocateExtendResources of
> > > > > > > > > > > GPUManager
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > > > > > > > > > > > that is the case, +1 to do it
> > > > during
> > > > > > > > > > > > > implementation.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > > > > > > > > > > > As Xintong said, we looked
> > into
> > > > how
> > > > > > > Spark
> > > > > > > > > > > > supports
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > > > > > Resource Scheduling" before
> > and
> > > > > > > decided to
> > > > > > > > > > > > > introduce a
> > > > > > > > > > > > > > > > > common
> > > > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > >
> > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > > > > > > > > > > > to make it more extensible. I
> > > > think
> > > > > > the
> > > > > > > > > > > > "resource"
> > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > proper
> > > > > > > > > > > > > > > > > > > > > level
> > > > > > > > > > > > > > > > > > > > > > > > > to contain all the configs of
> > > > > > extended
> > > > > > > > > > > resources.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48
> > AM
> > > > > > Xingbo
> > > > > > > > > Huang <
> > > > > > > > > > > > > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thanks a lot for the FLIP,
> > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > There is no doubt that GPU
> > > > resource
> > > > > > > > > > > management
> > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > > > > > > > > > > > facilitate the development
> > of
> > > > > > > AI-related
> > > > > > > > > > > > > applications
> > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > > PyFlink
> > > > > > > > > > > > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > I have only one comment
> > about
> > > > this
> > > > > > > wiki:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Regarding the names of
> > several
> > > > GPU
> > > > > > > > > > > > > configurations, I
> > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > delete the resource field
> > > > makes it
> > > > > > > > > consistent
> > > > > > > > > > > > > with
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > names
> > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > > > > > resource-related
> > > > configurations in
> > > > > > > > > > > > > TaskManagerOption.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > taskmanager.resource.gpu.discovery-script.path
> > > > > > > > > > > > > > > ->
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song <
> > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > 于2020年3月4日周三
> > > > > > > > > > > > > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Actually, Yangze, Yang
> > and I
> > > > also
> > > > > > > had
> > > > > > > > > an
> > > > > > > > > > > > > offline
> > > > > > > > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > > > > > > > > > > > the "GPU Support" as some
> > > > general
> > > > > > > > > "Extended
> > > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > > Support".
> > > > > > > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > > > > > > > > > > > supporting extended
> > > > resources in
> > > > > > a
> > > > > > > > > general
> > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > > > > > > > and extensible way. The
> > > > reason we
> > > > > > > > > propose
> > > > > > > > > > > > this
> > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > narrowing
> > > > > > > > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > > > > > > > > > > down to GPU alone, is
> > mainly
> > > > for
> > > > > > > the
> > > > > > > > > > > concern
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > extra
> > > > > > > > > > > > > > > > > > > efforts
> > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > > > > > > > > > > > capacity needed for a
> > general
> > > > > > > > > mechanism.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > To come up with a well
> > > > design on
> > > > > > a
> > > > > > > > > general
> > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > > > > > > > mechanism, we would need
> > to
> > > > > > > investigate
> > > > > > > > > > > more
> > > > > > > > > > > > > on how
> > > > > > > > > > > > > > > > > > people
> > > > > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > > > > > > > > kind of resources in
> > > > practice.
> > > > > > For
> > > > > > > > > GPU, we
> > > > > > > > > > > > > learnt
> > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > knowledge
> > > > > > > > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > > experts, Becket and his
> > team
> > > > > > > members.
> > > > > > > > > But
> > > > > > > > > > > for
> > > > > > > > > > > > > FPGA,
> > > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > > > > > > > > > > extended resources, we
> > don't
> > > > have
> > > > > > > such
> > > > > > > > > > > > > convenient
> > > > > > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > > > > > > > > > > > making the investigation
> > > > requires
> > > > > > > more
> > > > > > > > > > > > efforts,
> > > > > > > > > > > > > > > > which I
> > > > > > > > > > > > > > > > > > > tend
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > On the other hand, we
> > also
> > > > looked
> > > > > > > into
> > > > > > > > > how
> > > > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > > > supports a
> > > > > > > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > > > > > > > Resource Scheduling".
> > > > Assuming we
> > > > > > > want
> > > > > > > > > to
> > > > > > > > > > > > have
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > > > > > > resource mechanism in the
> > > > future,
> > > > > > > we
> > > > > > > > > > > believe
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > current
> > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > > > > > > > > design can be easily
> > > > extended, in
> > > > > > > an
> > > > > > > > > > > > > incremental
> > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > - The most important
> > part is
> > > > > > > probably
> > > > > > > > > user
> > > > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > > > > > > > > > > > configuration options to
> > > > define
> > > > > > the
> > > > > > > > > amount,
> > > > > > > > > > > > > > > discovery
> > > > > > > > > > > > > > > > > > > script
> > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > > > > > > > > > > > k8s) in a per resource
> > type
> > > > bias
> > > > > > > [1],
> > > > > > > > > which
> > > > > > > > > > > > is
> > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > > > > > proposed in this FLIP. I
> > > > think
> > > > > > > it's not
> > > > > > > > > > > > > necessary
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > > > > > > > in the general way atm,
> > > > since we
> > > > > > > do not
> > > > > > > > > > > have
> > > > > > > > > > > > > > > supports
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > > > > > types now. If later we
> > > > decided to
> > > > > > > have
> > > > > > > > > per
> > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > > > > > > > > > > > can have backwards
> > > > compatibility
> > > > > > > on the
> > > > > > > > > > > > current
> > > > > > > > > > > > > > > > > proposed
> > > > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > > > > > > > > > > > - For the GPU Manager, if
> > > > later
> > > > > > > needed
> > > > > > > > > we
> > > > > > > > > > > can
> > > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > > > > > > > > > > > Resource Manager" (or
> > > > whatever it
> > > > > > > is
> > > > > > > > > > > called).
> > > > > > > > > > > > > That
> > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > > > > > > > > > > > component-internal
> > > > refactoring.
> > > > > > > > > > > > > > > > > > > > > > > > > > > - For ResourceProfile and
> > > > > > > ResourceSpec,
> > > > > > > > > > > there
> > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > > > > > > > > > > > general extended
> > resource.
> > > > We can
> > > > > > > of
> > > > > > > > > course
> > > > > > > > > > > > > > > leverage
> > > > > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > > > > > > > > > > > fine grained GPU
> > scheduling.
> > > > That
> > > > > > > is
> > > > > > > > > also
> > > > > > > > > > > not
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > > > > > > > > > > step proposal, and would
> > > > require
> > > > > > > > > FLIP-56 to
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > finished
> > > > > > > > > > > > > > > > > > > > first.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > To summary up, I agree
> > with
> > > > > > Becket
> > > > > > > that
> > > > > > > > > > > have
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > > general extended resource
> > > > > > > mechanism,
> > > > > > > > > and
> > > > > > > > > > > keep
> > > > > > > > > > > > > it in
> > > > > > > > > > > > > > > > > mind
> > > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > > > > > > > > > > > and implementing the
> > current
> > > > one.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at
> > 9:18
> > > > AM
> > > > > > > Becket
> > > > > > > > > Qin <
> > > > > > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > That's a good point,
> > > > Stephan.
> > > > > > It
> > > > > > > > > makes
> > > > > > > > > > > > total
> > > > > > > > > > > > > > > sense
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > generalize
> > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > > > resource management to
> > > > support
> > > > > > > custom
> > > > > > > > > > > > > resources.
> > > > > > > > > > > > > > > > > Having
> > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > > > > > > > > > to add new resources by
> > > > > > > themselves.
> > > > > > > > > The
> > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > > > > > involve two different
> > > > aspects:
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. The custom resource
> > type
> > > > > > > > > definition.
> > > > > > > > > > > It
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > supported
> > > > > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > > > > > > > resources in
> > > > ResourceProfile
> > > > > > and
> > > > > > > > > > > > > ResourceSpec.
> > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. The custom resource
> > > > > > allocation
> > > > > > > > > logic,
> > > > > > > > > > > > > i.e. how
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > assign
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > > > > > > > > > > to different tasks,
> > > > operators,
> > > > > > > and
> > > > > > > > > so on.
> > > > > > > > > > > > > This
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > require
> > > > > > > > > > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > a. Subtask level - make
> > > > sure
> > > > > > the
> > > > > > > > > subtasks
> > > > > > > > > > > > > are put
> > > > > > > > > > > > > > > > > into
> > > > > > > > > > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > It is done by the
> > global
> > > > RM and
> > > > > > > is
> > > > > > > > > not
> > > > > > > > > > > > > > > customizable
> > > > > > > > > > > > > > > > > > right
> > > > > > > > > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > b. Operator level -
> > map the
> > > > > > exact
> > > > > > > > > > > resource
> > > > > > > > > > > > > to the
> > > > > > > > > > > > > > > > > > > operators
> > > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > GPU 1 for operator A,
> > GPU
> > > > 2 for
> > > > > > > > > operator
> > > > > > > > > > > B.
> > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > step
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > needed
> > > > > > > > > > > > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > > > > > > > > > > > the global RM does not
> > > > > > > distinguish
> > > > > > > > > > > > individual
> > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > It is true for memory,
> > but
> > > > not
> > > > > > > for
> > > > > > > > > GPU.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > The GPU manager is
> > > > designed to
> > > > > > > do 2.b
> > > > > > > > > > > here.
> > > > > > > > > > > > > So it
> > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > > > > > > > > > > > physical GPU
> > information
> > > > and
> > > > > > > > > bind/match
> > > > > > > > > > > > them
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > > > > > > general will fill in
> > the
> > > > > > missing
> > > > > > > > > piece to
> > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > custom
> > > > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > > > > > > > definition. But I'd
> > avoid
> > > > > > > calling it
> > > > > > > > > a
> > > > > > > > > > > > > "External
> > > > > > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > > > > > > > > > > confusion with RM,
> > maybe
> > > > > > > something
> > > > > > > > > like
> > > > > > > > > > > > > "Operator
> > > > > > > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > > > > > be more accurate. So
> > for
> > > > each
> > > > > > > > > resource
> > > > > > > > > > > type
> > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > > > > > > > > > > > "Operator Resource
> > > > Assigner" in
> > > > > > > the
> > > > > > > > > TM.
> > > > > > > > > > > For
> > > > > > > > > > > > > > > memory,
> > > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > but for other extended
> > > > > > resources,
> > > > > > > > > users
> > > > > > > > > > > may
> > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Personally I think a
> > > > pluggable
> > > > > > > > > "Operator
> > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > > > > > > > > > > > in this FLIP. But I am
> > > > also OK
> > > > > > > with
> > > > > > > > > > > having
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > > > > > > > > the interface between
> > the
> > > > > > > "Operator
> > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > > > > > take a while to settle
> > > > down if
> > > > > > we
> > > > > > > > > want to
> > > > > > > > > > > > > make it
> > > > > > > > > > > > > > > > > > > generic.
> > > > > > > > > > > > > > > > > > > > > But I
> > > > > > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > > > > > > > > > > implementation should
> > take
> > > > this
> > > > > > > > > future
> > > > > > > > > > > work
> > > > > > > > > > > > > into
> > > > > > > > > > > > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > > > > > > > > > > > don't need to break
> > > > backwards
> > > > > > > > > > > compatibility
> > > > > > > > > > > > > once
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at
> > > > 12:27 AM
> > > > > > > > > Stephan
> > > > > > > > > > > > Ewen
> > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for writing
> > > > this
> > > > > > > FLIP.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > I cannot really give
> > much
> > > > > > input
> > > > > > > > > into
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > mechanics
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > and GPU allocation,
> > as I
> > > > have
> > > > > > > no
> > > > > > > > > > > > experience
> > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > One thought I had
> > when
> > > > > > reading
> > > > > > > the
> > > > > > > > > > > > > proposal is
> > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > the "GPU Manager" as
> > an
> > > > > > > "External
> > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > Manager",
> > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > The way I understand
> > the
> > > > > > > > > > > ResourceProfile
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > It has the advantage
> > > > that it
> > > > > > > looks
> > > > > > > > > more
> > > > > > > > > > > > > > > > extensible.
> > > > > > > > > > > > > > > > > > > Maybe
> > > > > > > > > > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Resource, a
> > specialized
> > > > > > NVIDIA
> > > > > > > GPU
> > > > > > > > > > > > > Resource,
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > FPGA
> > > > > > > > > > > > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020
> > at
> > > > 7:57
> > > > > > AM
> > > > > > > > > Becket
> > > > > > > > > > > > Qin <
> > > > > > > > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP
> > > > Yangze.
> > > > > > > GPU
> > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > for machine
> > learning
> > > > use
> > > > > > > cases.
> > > > > > > > > > > > Actually
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > question from the
> > > > users who
> > > > > > > are
> > > > > > > > > > > > > interested in
> > > > > > > > > > > > > > > > > using
> > > > > > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Some quick
> > comments /
> > > > > > > questions
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > > wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. The WebUI /
> > REST API
> > > > > > > should
> > > > > > > > > > > probably
> > > > > > > > > > > > > also
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > mentioned in
> > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Is the data
> > > > structure
> > > > > > that
> > > > > > > > > holds
> > > > > > > > > > > GPU
> > > > > > > > > > > > > info
> > > > > > > > > > > > > > > > > also a
> > > > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket)
> > Qin
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3,
> > 2020 at
> > > > > > 10:15
> > > > > > > AM
> > > > > > > > > > > Xintong
> > > > > > > > > > > > > Song
> > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for
> > drafting
> > > > the
> > > > > > > FLIP
> > > > > > > > > and
> > > > > > > > > > > > > kicking
> > > > > > > > > > > > > > > off
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Big +1 for this
> > > > feature.
> > > > > > > > > Supporting
> > > > > > > > > > > > > using
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > especially for
> > the ML
> > > > > > > > > scenarios.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've reviewed the
> > > > FLIP
> > > > > > wiki
> > > > > > > > > doc and
> > > > > > > > > > > > it
> > > > > > > > > > > > > > > looks
> > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > me. I
> > > > > > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > very good first
> > step
> > > > for
> > > > > > > > > Flink's
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > supports.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2,
> > 2020
> > > > at
> > > > > > > 12:06 PM
> > > > > > > > > > > > Yangze
> > > > > > > > > > > > > Guo
> > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We would like
> > to
> > > > start
> > > > > > a
> > > > > > > > > > > discussion
> > > > > > > > > > > > > > > thread
> > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > support in
> > > > Flink"[1].
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This FLIP
> > mainly
> > > > > > > discusses
> > > > > > > > > the
> > > > > > > > > > > > > following
> > > > > > > > > > > > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Enable user
> > to
> > > > > > > configure
> > > > > > > > > how
> > > > > > > > > > > many
> > > > > > > > > > > > > GPUs
> > > > > > > > > > > > > > > > in a
> > > > > > > > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > forward such
> > > > > > > requirements to
> > > > > > > > > the
> > > > > > > > > > > > > external
> > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > Kubernetes/Yarn/Mesos
> > > > > > > > > setups).
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Provide
> > > > information
> > > > > > of
> > > > > > > > > > > available
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Key changes
> > > > proposed in
> > > > > > > the
> > > > > > > > > FLIP
> > > > > > > > > > > > are
> > > > > > > > > > > > > as
> > > > > > > > > > > > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU
> > > > resource
> > > > > > > > > > > requirements
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce
> > > > GPUManager
> > > > > > as
> > > > > > > > > one of
> > > > > > > > > > > > the
> > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > manager
> > > > > > > > > > > > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and expose GPU
> > > > resource
> > > > > > > > > > > information
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > context
> > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce the
> > > > default
> > > > > > > > > script
> > > > > > > > > > > for
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > discovery,
> > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the privilege
> > mode
> > > > to
> > > > > > > help
> > > > > > > > > user
> > > > > > > > > > > to
> > > > > > > > > > > > > > > achieve
> > > > > > > > > > > > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > standalone
> > mode.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please find
> > more
> > > > > > details
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > FLIP
> > > > > > > > > > > > > wiki
> > > > > > > > > > > > > > > > > > > document
> > > > > > > > > > > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >


Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Stephan Ewen <se...@apache.org>.
Sounds good!

On Tue, Mar 31, 2020 at 4:32 AM Yangze Guo <ka...@gmail.com> wrote:

> Hi everyone,
> I've updated the FLIP accordingly. The key change is replacing two
> resource allocation interfaces to config options.
>
> If there are no further comments, I would like to start a voting
> thread by tomorrow.
>
> Best,
> Yangze Guo
>
> On Mon, Mar 30, 2020 at 9:15 PM Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > If there is no need for the ExternalResourceDriver on the RM side, then
> it
> > is always a good idea to keep it simple and don't introduce it. One can
> > always change things once one realizes that there is a need for it.
> >
> > Cheers,
> > Till
> >
> > On Mon, Mar 30, 2020 at 12:00 PM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Hi @Till, @Xintong
> > >
> > > I think even without the credential concerns, replacing the interfaces
> > > with configuration options is a good idea from my side.
> > > - Currently, I don't see any external resource does not compatible
> > > with this mechanism
> > > - It reduces the burden of users to implement a plugin themselves.
> > > WDYT?
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Mar 30, 2020 at 5:44 PM Xintong Song <to...@gmail.com>
> > > wrote:
> > > >
> > > > I also agree that the pluggable ExternalResourceDriver should be
> loaded
> > > by
> > > > the cluster class loader. Despite the plugin might be implemented by
> > > users,
> > > > external resources (as part of task executor resources) should be
> cluster
> > > > configurations, unlike job-level user codes such as UDFs, because the
> > > task
> > > > executors belongs to the cluster rather than jobs.
> > > >
> > > >
> > > > IIUC, the concern Stephan raised is about the potential credential
> > > problem
> > > > when executing user codes on RM with cluster class loader. The
> concern
> > > > makes sense to me, and I think what Yangze suggested should be a good
> > > > approach trying to prevent such credential problems. The only
> purpose we
> > > > tried to execute user codes (i.e.
> getKubernetes/YarnExternalResource) on
> > > RM
> > > > was that, we need to set these key-value pairs to pod/container
> requests.
> > > > Replacing the interfaces getKubernetes/YarnExternalResource with
> > > > configuration options
> > > > 'external-resource.{resourceName}.yarn/kubernetes.key/amount',
> > > > we can still fulfill that purpose, without the credential risks.
> > > >
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Mon, Mar 30, 2020 at 5:17 PM Till Rohrmann <tr...@apache.org>
> > > wrote:
> > > >
> > > > > At the moment the RM does not have a user code class loader and I
> agree
> > > > > with Stephan that it should stay like this. This, however, does not
> > > mean
> > > > > that we cannot support pluggable components in the RM. As long as
> the
> > > > > plugins are on the system's class path, it should be fine for the
> RM to
> > > > > load them. For example, we could add external resources via Flink's
> > > plugin
> > > > > mechanism or something similar.
> > > > >
> > > > > A very simple implementation of such an ExternalResourceDriver
> could
> > > be a
> > > > > class which simply returns what is written in the flink-conf.yaml
> > > under a
> > > > > given key.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo <ka...@gmail.com>
> wrote:
> > > > >
> > > > > > Hi, Stephan,
> > > > > >
> > > > > > I see your concern and I totally agree with you.
> > > > > >
> > > > > > The interface on RM side is now `Map<String key, String/Long
> value>
> > > > > > getYarn/KubernetesExternalResource()`. The only valid
> information RM
> > > > > > get from it is the configuration key of that external resource in
> > > > > > Yarn/K8s. The "String/Long value" would be the same as the
> > > > > > external-resource.{resourceName}.amount.
> > > > > > So, I think it makes sense to replace these two interfaces with
> two
> > > > > > configs, i.e.
> external-resource.{resourceName}.yarn/kubernetes.key.
> > > We
> > > > > > may lose some extensibility, but AFAIK it could work with common
> > > > > > external resources like GPU, FPGA. WDYT?
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > Maybe one final comment: It is probably not an issue, but let's
> > > try and
> > > > > > > keep user code (via user code classloader) out of the
> > > ResourceManager,
> > > > > if
> > > > > > > possible.
> > > > > > >
> > > > > > > As background:
> > > > > > >
> > > > > > > There were thoughts in the past to support setups where the RM
> > > must run
> > > > > > > with "superuser" credentials, but we cannot run JM/TM with
> these
> > > > > > > credentials, as the user code might access them otherwise.
> > > > > > > This is actually possible today, you can run the RM in a
> different
> > > JVM
> > > > > or
> > > > > > > in a different container, and give it more credentials than
> JMs /
> > > TMs.
> > > > > > But
> > > > > > > for this to be feasible, we cannot allow any user-defined code
> to
> > > be in
> > > > > > the
> > > > > > > JVM, because that instantaneously breaks the isolation of
> > > credentials.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo <karmagyz@gmail.com
> >
> > > wrote:
> > > > > > >
> > > > > > > > Thanks for the feedback, @Till and @Xintong.
> > > > > > > >
> > > > > > > > Regarding separating the interface, I'm also +1 with it.
> > > > > > > >
> > > > > > > > Regarding the resource allocation interface, true, it's
> > > dangerous to
> > > > > > > > give much access to user codes. Changing the return type to
> > > > > Map<String
> > > > > > > > key, String/Long value> makes sense to me. AFAIK, it is
> > > compatible
> > > > > > > > with all the first-party supported resources for
> > > Yarn/Kubernetes. It
> > > > > > > > could also free us from the potential dependency issue as
> well.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song <
> > > tonysong820@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Thanks for updating the FLIP, Yangze.
> > > > > > > > >
> > > > > > > > > I agree with Till that we probably want to separate the
> > > K8s/Yarn
> > > > > > > > decorator
> > > > > > > > > calls. Users can still configure one driver class, and we
> can
> > > use
> > > > > > > > > `instanceof` to check whether the driver implemented
> K8s/Yarn
> > > > > > specific
> > > > > > > > > interfaces.
> > > > > > > > >
> > > > > > > > > Moreover, I'm not sure about exposing entire
> > > `ContainerRequest` /
> > > > > > `Pod`
> > > > > > > > > (`AbstractKubernetesStepDecorator` directly manipulates on
> > > `Pod`)
> > > > > to
> > > > > > user
> > > > > > > > > codes. It gives more access to user codes than needed for
> > > defining
> > > > > > > > external
> > > > > > > > > resource, which might cause problems. Instead, I would
> suggest
> > > to
> > > > > > have
> > > > > > > > > interface like `Map<String key, String value>
> > > > > > > > > getYarn/KubernetesExternalResource()` and assemble them
> into
> > > > > > > > > `ContainerRequest` / `Pod` in
> Yarn/KubernetesResourceManager.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <
> > > > > trohrmann@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > I'm a bit late to the party. I think the current proposal
> > > looks
> > > > > > good.
> > > > > > > > > >
> > > > > > > > > > Concerning the ExternalResourceDriver interface defined
> in
> > > the
> > > > > FLIP
> > > > > > > > [1], I
> > > > > > > > > > would suggest to not include the decorator calls for
> > > Kubernetes
> > > > > and
> > > > > > > > Yarn in
> > > > > > > > > > the base interface. Instead I would suggest to segregate
> the
> > > > > > deployment
> > > > > > > > > > specific decorator calls into separate interfaces. That
> way
> > > an
> > > > > > > > > > ExternalResourceDriver does not have to support all
> > > deployments
> > > > > > from
> > > > > > > > the
> > > > > > > > > > very beginning. Moreover, some resources might not be
> > > supported
> > > > > by
> > > > > > a
> > > > > > > > > > specific deployment target and the natural way to express
> > > this
> > > > > > would
> > > > > > > > be to
> > > > > > > > > > not implement the respective deployment specific
> interface.
> > > > > > > > > >
> > > > > > > > > > Moreover, having void
> > > > > > > > > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> > > > > > > > containerRequest)
> > > > > > > > > > in the ExternalResourceDriver interface would require
> Hadoop
> > > on
> > > > > > Flink's
> > > > > > > > > > classpath whenever the external resource driver is being
> > > used.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Till
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <
> > > sewen@apache.org>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Nice, thanks a lot!
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <
> > > > > karmagyz@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for the suggestion, @Stephan, @Becket and
> > > @Xintong.
> > > > > > > > > > > >
> > > > > > > > > > > > I've updated the FLIP accordingly. I do not add a
> > > > > > > > > > > > ResourceInfoProvider. Instead, I introduce the
> > > > > > > > ExternalResourceDriver,
> > > > > > > > > > > > which takes the responsibility of all relevant
> > > operations on
> > > > > > both
> > > > > > > > RM
> > > > > > > > > > > > and TM sides.
> > > > > > > > > > > > After a rethink about decoupling the management of
> > > external
> > > > > > > > resources
> > > > > > > > > > > > from TaskExecutor, I think we could do the same
> thing on
> > > the
> > > > > > > > > > > > ResourceManager side. We do not need to add a
> specific
> > > > > > allocation
> > > > > > > > > > > > logic to the ResourceManager each time we add a
> specific
> > > > > > external
> > > > > > > > > > > > resource.
> > > > > > > > > > > > - For Yarn, we need the ExternalResourceDriver to
> edit
> > > the
> > > > > > > > > > > > containerRequest.
> > > > > > > > > > > > - For Kubenetes, ExternalResourceDriver could
> provide a
> > > > > > decorator
> > > > > > > > for
> > > > > > > > > > > > the TM pod.
> > > > > > > > > > > >
> > > > > > > > > > > > In this way, just like MetricReporter, we allow
> users to
> > > > > define
> > > > > > > > their
> > > > > > > > > > > > custom ExternalResourceDriver. It is more extensible
> and
> > > fits
> > > > > > the
> > > > > > > > > > > > separation of concerns. For more details, please
> take a
> > > look
> > > > > at
> > > > > > > > [1].
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <
> > > > > sewen@apache.org
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > This sounds good to go ahead from my side.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I like the approach that Becket suggested - in that
> > > case
> > > > > the
> > > > > > core
> > > > > > > > > > > > > abstraction that everyone would need to understand
> > > would be
> > > > > > > > "external
> > > > > > > > > > > > > resource allocation" and the
> "ResourceInfoProvider",
> > > and
> > > > > the
> > > > > > GPU
> > > > > > > > > > > specific
> > > > > > > > > > > > > code would be a specific implementation only known
> to
> > > that
> > > > > > > > component
> > > > > > > > > > > that
> > > > > > > > > > > > > allocates the external resource. That fits the
> > > separation
> > > > > of
> > > > > > > > concerns
> > > > > > > > > > > > well.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I also understand that it should not be
> > > over-engineered in
> > > > > > the
> > > > > > > > first
> > > > > > > > > > > > > version, so some simplification makes sense, and
> then
> > > > > > gradually
> > > > > > > > > > expand
> > > > > > > > > > > > from
> > > > > > > > > > > > > there.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So +1 to go ahead with what was suggested above
> > > (Xintong /
> > > > > > > > Becket)
> > > > > > > > > > from
> > > > > > > > > > > > my
> > > > > > > > > > > > > side.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <
> > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the comments, Stephan & Becket.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Stephan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I see your concern, and I completely agree with
> you
> > > that
> > > > > we
> > > > > > > > should
> > > > > > > > > > > > first
> > > > > > > > > > > > > > think about the "library" / "plugin" /
> "extension"
> > > style
> > > > > if
> > > > > > > > > > possible.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If GPUs are sliced and assigned during
> scheduling,
> > > there
> > > > > > may be
> > > > > > > > > > > reason,
> > > > > > > > > > > > > > > although it looks that it would belong to the
> slot
> > > > > then.
> > > > > > Is
> > > > > > > > that
> > > > > > > > > > > > what we
> > > > > > > > > > > > > > > are doing here?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In the current proposal, we do not have the GPUs
> > > sliced
> > > > > and
> > > > > > > > > > assigned
> > > > > > > > > > > to
> > > > > > > > > > > > > > slots, because it could be problematic without
> > > dynamic
> > > > > slot
> > > > > > > > > > > allocation.
> > > > > > > > > > > > > > E.g., the number of GPUs might not be evenly
> > > divisible by
> > > > > > the
> > > > > > > > > > number
> > > > > > > > > > > of
> > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think it makes sense to eventually have the
> GPUs
> > > > > > assigned to
> > > > > > > > > > slots.
> > > > > > > > > > > > Even
> > > > > > > > > > > > > > then, we might still need a TM level GPUManager
> (or
> > > > > > > > > > ResourceProvider
> > > > > > > > > > > > like
> > > > > > > > > > > > > > Becket suggested). For memory, in each slot we
> can
> > > simply
> > > > > > > > request
> > > > > > > > > > the
> > > > > > > > > > > > > > amount of memory, leaving it to JVM / OS to
> decide
> > > which
> > > > > > memory
> > > > > > > > > > > > (address)
> > > > > > > > > > > > > > should be assigned. For GPU, and potentially
> other
> > > > > > resources
> > > > > > > > like
> > > > > > > > > > > > FPGA, we
> > > > > > > > > > > > > > need to explicitly specify which GPU (index)
> should
> > > be
> > > > > > used.
> > > > > > > > > > > > Therefore, we
> > > > > > > > > > > > > > need some component at the TM level to coordinate
> > > which
> > > > > > slot
> > > > > > > > uses
> > > > > > > > > > > which
> > > > > > > > > > > > > > GPU.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > IMO, unless we say Flink will not support
> slot-level
> > > GPU
> > > > > > > > slicing at
> > > > > > > > > > > > least
> > > > > > > > > > > > > > in the foreseeable future, I don't see a good
> way to
> > > > > avoid
> > > > > > > > touching
> > > > > > > > > > > > the TM
> > > > > > > > > > > > > > core. To that end, I think Becket's suggestion
> > > points to
> > > > > a
> > > > > > good
> > > > > > > > > > > > direction,
> > > > > > > > > > > > > > that supports more features (GPU, FPGA, etc.)
> with
> > > less
> > > > > > > > coupling to
> > > > > > > > > > > > the TM
> > > > > > > > > > > > > > core (only needs to understand the general
> > > interfaces).
> > > > > The
> > > > > > > > > > detailed
> > > > > > > > > > > > > > implementation for specific resource types can
> even
> > > be
> > > > > > > > encapsulated
> > > > > > > > > > > as
> > > > > > > > > > > > a
> > > > > > > > > > > > > > library.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for sharing your thought on the final
> state.
> > > > > > Despite the
> > > > > > > > > > > > details how
> > > > > > > > > > > > > > the interfaces should look like, I think this is
> a
> > > really
> > > > > > good
> > > > > > > > > > > > abstraction
> > > > > > > > > > > > > > for supporting general resource types.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'd like to further clarify that, the following
> three
> > > > > > things
> > > > > > > > are
> > > > > > > > > > all
> > > > > > > > > > > > that
> > > > > > > > > > > > > > the "Flink core" needs to understand.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    - The *amount* of resource, for scheduling.
> > > Actually,
> > > > > we
> > > > > > > > already
> > > > > > > > > > > > have
> > > > > > > > > > > > > >    the Resource class in ResourceProfile and
> > > ResourceSpec
> > > > > > for
> > > > > > > > > > > extended
> > > > > > > > > > > > > >    resource. It's just not really used.
> > > > > > > > > > > > > >    - The *info*, that Flink provides to the
> > > operators /
> > > > > > user
> > > > > > > > codes.
> > > > > > > > > > > > > >    - The *provider*, which generates the info
> based
> > > on
> > > > > the
> > > > > > > > amount.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The "core" does not need to understand the
> specific
> > > > > > > > implementation
> > > > > > > > > > > > details
> > > > > > > > > > > > > > of the above three. They can even be implemented
> in a
> > > > > > 3rd-party
> > > > > > > > > > > > library.
> > > > > > > > > > > > > > Similar to how we allow users to define their
> custom
> > > > > > > > > > MetricReporter.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <
> > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the comment, Stephan.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   - If everything becomes a "core feature", it
> will
> > > > > make
> > > > > > the
> > > > > > > > > > > project
> > > > > > > > > > > > hard
> > > > > > > > > > > > > > > > to develop in the future. Thinking "library"
> /
> > > > > > "plugin" /
> > > > > > > > > > > > "extension"
> > > > > > > > > > > > > > > style
> > > > > > > > > > > > > > > > where possible helps.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Completely agree. It is much more important to
> > > design a
> > > > > > > > mechanism
> > > > > > > > > > > > than
> > > > > > > > > > > > > > > focusing on a specific case. Here is what I am
> > > thinking
> > > > > > to
> > > > > > > > fully
> > > > > > > > > > > > support
> > > > > > > > > > > > > > > custom resource management:
> > > > > > > > > > > > > > > 1. On the JM / RM side, use ResourceProfile and
> > > > > > ResourceSpec
> > > > > > > > to
> > > > > > > > > > > > define
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > resource and the amount required. They will be
> > > used to
> > > > > > find
> > > > > > > > > > > suitable
> > > > > > > > > > > > TMs
> > > > > > > > > > > > > > > slots to run the tasks. At this point, the
> > > resources
> > > > > are
> > > > > > only
> > > > > > > > > > > > measured by
> > > > > > > > > > > > > > > amount, i.e. they do not have individual ID.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. On the TM side, have something like
> > > > > > > > *"ResourceInfoProvider"*
> > > > > > > > > > to
> > > > > > > > > > > > > > identify
> > > > > > > > > > > > > > > and provides the detail information of the
> > > individual
> > > > > > > > resource,
> > > > > > > > > > > e.g.
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > ID.. It is important because the operator may
> have
> > > to
> > > > > > > > explicitly
> > > > > > > > > > > > interact
> > > > > > > > > > > > > > > with the physical resource it uses. The
> > > > > > ResourceInfoProvider
> > > > > > > > > > might
> > > > > > > > > > > > look
> > > > > > > > > > > > > > > like something below.
> > > > > > > > > > > > > > > interface ResourceInfoProvider<INFO> {
> > > > > > > > > > > > > > >     Map<AbstractID, INFO>
> > > > > retrieveResourceInfo(OperatorId
> > > > > > > > opId,
> > > > > > > > > > > > > > > ResourceProfile resourceProfile);
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - There could be several
> "*ResourceInfoProvider*"
> > > > > > configured
> > > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > > > TM to
> > > > > > > > > > > > > > > retrieve the information for different
> resources.
> > > > > > > > > > > > > > > - The TM will be responsible to assign those
> > > individual
> > > > > > > > resources
> > > > > > > > > > > to
> > > > > > > > > > > > each
> > > > > > > > > > > > > > > operator according to their requested amount.
> > > > > > > > > > > > > > > - The operators will be able to get the
> > > ResourceInfo
> > > > > from
> > > > > > > > their
> > > > > > > > > > > > > > > RuntimeContext.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If we agree this is a reasonable final state.
> We
> > > can
> > > > > > adapt
> > > > > > > > the
> > > > > > > > > > > > current
> > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > to it. In fact it does not sound a big change
> to
> > > me.
> > > > > All
> > > > > > the
> > > > > > > > > > > proposed
> > > > > > > > > > > > > > > configuration can be as is, it is just that
> Flink
> > > > > itself
> > > > > > > > won't
> > > > > > > > > > care
> > > > > > > > > > > > about
> > > > > > > > > > > > > > > them, instead a GPUInfoProviver implementing
> the
> > > > > > > > > > > ResourceInfoProvider
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > use them.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <
> > > > > > > > sewen@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi all!
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The main point I wanted to throw into the
> > > discussion
> > > > > > is the
> > > > > > > > > > > > following:
> > > > > > > > > > > > > > > >   - With more and more use cases, more and
> more
> > > tools
> > > > > > go
> > > > > > > > into
> > > > > > > > > > > Flink
> > > > > > > > > > > > > > > >   - If everything becomes a "core feature",
> it
> > > will
> > > > > > make
> > > > > > > > the
> > > > > > > > > > > > project
> > > > > > > > > > > > > > hard
> > > > > > > > > > > > > > > > to develop in the future. Thinking "library"
> /
> > > > > > "plugin" /
> > > > > > > > > > > > "extension"
> > > > > > > > > > > > > > > style
> > > > > > > > > > > > > > > > where possible helps.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >   - A good thought experiment is always: How
> many
> > > > > > future
> > > > > > > > > > > developers
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > interact with this code (and possibly
> understand
> > > it
> > > > > > > > partially),
> > > > > > > > > > > > even if
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > features they touch have nothing to do with
> GPU
> > > > > > support. If
> > > > > > > > > > many
> > > > > > > > > > > > > > > > contributors to unrelated features will have
> to
> > > touch
> > > > > > it
> > > > > > > > and
> > > > > > > > > > > > understand
> > > > > > > > > > > > > > > it,
> > > > > > > > > > > > > > > > then let's think if there is a different
> > > solution.
> > > > > > Maybe
> > > > > > > > there
> > > > > > > > > > is
> > > > > > > > > > > > not,
> > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > then we should be sure why.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >   - That led me to raising this issue: If
> the GPU
> > > > > > manager
> > > > > > > > > > > becomes a
> > > > > > > > > > > > > > core
> > > > > > > > > > > > > > > > service in the TaskManager, Environment,
> > > > > > RuntimeContext,
> > > > > > > > etc.
> > > > > > > > > > > then
> > > > > > > > > > > > > > > everyone
> > > > > > > > > > > > > > > > developing TM and streaming tasks need to
> > > understand
> > > > > > the
> > > > > > > > GPU
> > > > > > > > > > > > manager.
> > > > > > > > > > > > > > > That
> > > > > > > > > > > > > > > > seems oddly specific, is my impression.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Access to configuration seems not the right
> > > reason to
> > > > > > do
> > > > > > > > that.
> > > > > > > > > > We
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > expose the Flink configuration from the
> > > > > RuntimeContext
> > > > > > > > anyways.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If GPUs are sliced and assigned during
> > > scheduling,
> > > > > > there
> > > > > > > > may be
> > > > > > > > > > > > reason,
> > > > > > > > > > > > > > > > although it looks that it would belong to the
> > > slot
> > > > > > then. Is
> > > > > > > > > > that
> > > > > > > > > > > > what
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > are doing here?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song
> <
> > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > IMO, eventually an operator should only see
> > > info of
> > > > > > GPUs
> > > > > > > > that
> > > > > > > > > > > are
> > > > > > > > > > > > > > > > dedicated
> > > > > > > > > > > > > > > > > for it, instead of all GPUs on the
> > > > > machine/container
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > > current
> > > > > > > > > > > > > > > > design.
> > > > > > > > > > > > > > > > > It does not make sense to let the user who
> > > writes a
> > > > > > UDF
> > > > > > > > to
> > > > > > > > > > > worry
> > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > coordination among multiple operators
> running
> > > on
> > > > > the
> > > > > > same
> > > > > > > > > > > > machine.
> > > > > > > > > > > > > > And
> > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > we want to limit the GPU info an operator
> > > sees, we
> > > > > > > > should not
> > > > > > > > > > > > let the
> > > > > > > > > > > > > > > > > operator to instantiate GPUManager, which
> > > means we
> > > > > > have
> > > > > > > > to
> > > > > > > > > > > expose
> > > > > > > > > > > > > > > > something
> > > > > > > > > > > > > > > > > through runtime context, either GPU info or
> > > some
> > > > > > kind of
> > > > > > > > > > > limited
> > > > > > > > > > > > > > access
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > the GPUManager.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin
> <
> > > > > > > > > > > becket.qin@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > It probably make sense for us to first
> agree
> > > on
> > > > > the
> > > > > > > > final
> > > > > > > > > > > > state.
> > > > > > > > > > > > > > More
> > > > > > > > > > > > > > > > > > specifically, will the resource info be
> > > exposed
> > > > > > through
> > > > > > > > > > > runtime
> > > > > > > > > > > > > > > context
> > > > > > > > > > > > > > > > > > eventually?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > If that is the final state and we have a
> > > seamless
> > > > > > > > migration
> > > > > > > > > > > > story
> > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > FLIP to that final state, Personally I
> think
> > > it
> > > > > is
> > > > > > OK
> > > > > > > > to
> > > > > > > > > > > > expose the
> > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > info in the runtime context.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong
> > > Song <
> > > > > > > > > > > > > > tonysong820@gmail.com
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Yangze,
> > > > > > > > > > > > > > > > > > > I think what Stephan means (@Stephan,
> > > please
> > > > > > correct
> > > > > > > > me
> > > > > > > > > > if
> > > > > > > > > > > > I'm
> > > > > > > > > > > > > > > wrong)
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > that, we might not need to hold and
> > > maintain
> > > > > the
> > > > > > > > > > GPUManager
> > > > > > > > > > > > as a
> > > > > > > > > > > > > > > > > service
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > TaskManagerServices or RuntimeContext.
> An
> > > > > > > > alternative is
> > > > > > > > > > to
> > > > > > > > > > > > > > create
> > > > > > > > > > > > > > > /
> > > > > > > > > > > > > > > > > > > retrieve the GPUManager only in the
> > > operators
> > > > > > that
> > > > > > > > need
> > > > > > > > > > it,
> > > > > > > > > > > > e.g.,
> > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Stephan,
> > > > > > > > > > > > > > > > > > > I agree with you on excluding
> GPUManager
> > > from
> > > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >    - For the first step, where we
> provide
> > > > > unified
> > > > > > > > > > TM-level
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > > > >    to all operators, it should be fine
> to
> > > have
> > > > > > > > operators
> > > > > > > > > > > > access /
> > > > > > > > > > > > > > > > > > >    lazy-initiate GPUManager by
> themselves.
> > > > > > > > > > > > > > > > > > >    - In future, we might have some more
> > > > > > fine-grained
> > > > > > > > GPU
> > > > > > > > > > > > > > > management,
> > > > > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > > >    we need to maintain GPUManager as a
> > > service
> > > > > > and
> > > > > > > > put
> > > > > > > > > > GPU
> > > > > > > > > > > > info
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > slot
> > > > > > > > > > > > > > > > > > >    profiles. But at least for now it's
> not
> > > > > > necessary
> > > > > > > > to
> > > > > > > > > > > > introduce
> > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > >    complexity.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > However, I have some concerns on
> excluding
> > > > > > GPUManager
> > > > > > > > > > from
> > > > > > > > > > > > > > > > > RuntimeContext
> > > > > > > > > > > > > > > > > > > and let operators access it directly.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >    - Configurations needed for
> creating the
> > > > > > > > GPUManager is
> > > > > > > > > > > not
> > > > > > > > > > > > > > > always
> > > > > > > > > > > > > > > > > > >    available for operators.
> > > > > > > > > > > > > > > > > > >    - If later we want to have
> fine-grained
> > > > > > control
> > > > > > > > over
> > > > > > > > > > GPU
> > > > > > > > > > > > > > (e.g.,
> > > > > > > > > > > > > > > > > > >    operators in each slot can only see
> GPUs
> > > > > > reserved
> > > > > > > > for
> > > > > > > > > > > that
> > > > > > > > > > > > > > > slot),
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I would suggest to wrap the GPUManager
> > > behind
> > > > > > > > > > > RuntimeContext
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > > > expose the GPUInfo to users. For now,
> we
> > > can
> > > > > > declare
> > > > > > > > a
> > > > > > > > > > > method
> > > > > > > > > > > > > > > > > > > `getGPUInfo()` in RuntimeContext, with
> a
> > > > > default
> > > > > > > > > > definition
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > > calls
> > > > > > > > > > > > > > > > > > > `GPUManager.get()` to get the
> > > lazily-created
> > > > > > > > GPUManager.
> > > > > > > > > > If
> > > > > > > > > > > > later
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > want
> > > > > > > > > > > > > > > > > > > to create / retrieve GPUManager in a
> > > different
> > > > > > way,
> > > > > > > > we
> > > > > > > > > > can
> > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > > > > how `getGPUInfo` is implemented,
> without
> > > > > needing
> > > > > > to
> > > > > > > > > > change
> > > > > > > > > > > > any
> > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze
> > > Guo <
> > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > @Shephan
> > > > > > > > > > > > > > > > > > > > Do you mean Minicluster? Yes, it
> makes
> > > sense
> > > > > to
> > > > > > > > share
> > > > > > > > > > the
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > Manager
> > > > > > > > > > > > > > > > > > > > in such scenario.
> > > > > > > > > > > > > > > > > > > > If that's what you worry about, I'm
> +1
> > > for
> > > > > > holding
> > > > > > > > > > > > > > > > > > > > GPUManager(ExternalResourceManagers)
> in
> > > > > > > > TaskExecutor
> > > > > > > > > > > > instead of
> > > > > > > > > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Regarding the
> > > RuntimeContext/FunctionContext,
> > > > > > it
> > > > > > > > just
> > > > > > > > > > > > holds the
> > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > info instead of the GPU Manager.
> AFAIK,
> > > it's
> > > > > > the
> > > > > > > > only
> > > > > > > > > > > > place we
> > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > pass GPU info to the
> > > > > > > > RichFunction/UserDefinedFunction.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac
> > > > > Godfried
> > > > > > <
> > > > > > > > > > > > > > > > isaac@paddlesoft.net
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20
> +0000
> > > > > > > > > > > sewen@apache.org
> > > > > > > > > > > > > > wrote
> > > > > > > > > > > > > > > > > ----
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Can we somehow keep this out
> of the
> > > > > > > > TaskManager
> > > > > > > > > > > > services
> > > > > > > > > > > > > > > > > > > > > > I fear that we could not. IMO,
> the
> > > > > > > > GPUManager(or
> > > > > > > > > > > > > > > > > > > > > > ExternalServicesManagers in
> future)
> > > is
> > > > > > > > conceptually
> > > > > > > > > > > > one of
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > > > > > manager services, just like
> > > MemoryManager
> > > > > > > > before
> > > > > > > > > > > 1.10.
> > > > > > > > > > > > > > > > > > > > > > - It maintains/holds the GPU
> > > resource at
> > > > > TM
> > > > > > > > level
> > > > > > > > > > and
> > > > > > > > > > > > all
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > operators allocate the GPU
> resources
> > > from
> > > > > > it.
> > > > > > > > So,
> > > > > > > > > > it
> > > > > > > > > > > > should
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > exclusive to a single
> TaskExecutor.
> > > > > > > > > > > > > > > > > > > > > > - We could add a collection
> called
> > > > > > > > > > > > ExternalResourceManagers
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > hold
> > > > > > > > > > > > > > > > > > > > > > all managers of other external
> > > resources
> > > > > > in the
> > > > > > > > > > > future.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Can you help me understand why this
> > > needs
> > > > > the
> > > > > > > > > > addition
> > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > > > > > > > > > > Are you worried about the case when
> > > > > multiple
> > > > > > Task
> > > > > > > > > > > > Executors
> > > > > > > > > > > > > > run
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > JVM? That's not common, but
> wouldn't it
> > > > > > actually
> > > > > > > > be
> > > > > > > > > > > good
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > case
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > share the GPU Manager, given that
> the
> > > GPU
> > > > > is
> > > > > > > > shared?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > ---------------------------
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > What parts need information about
> > > this?
> > > > > > > > > > > > > > > > > > > > > > In this FLIP, operators need the
> > > > > > information.
> > > > > > > > Thus,
> > > > > > > > > > > we
> > > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > information to the
> > > > > > > > RuntimeContext/FunctionContext.
> > > > > > > > > > > The
> > > > > > > > > > > > slot
> > > > > > > > > > > > > > > > > profile
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > not aware of GPU resources as
> GPU is
> > > TM
> > > > > > level
> > > > > > > > > > > resource
> > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Can the GPU Manager be a "self
> > > > > contained"
> > > > > > > > thing
> > > > > > > > > > > that
> > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > configuration, and then abstracts
> > > > > > everything
> > > > > > > > > > > > internally?
> > > > > > > > > > > > > > > > > > > > > > Yes, we just pass the path/args
> of
> > > the
> > > > > > discover
> > > > > > > > > > > script
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > > > GPUs per TM to it. It takes the
> > > > > > responsibility
> > > > > > > > to
> > > > > > > > > > get
> > > > > > > > > > > > the
> > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > information and expose them to
> the
> > > > > > > > > > > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > Operators. Meanwhile, we'd
> better not
> > > > > allow
> > > > > > > > > > operators
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > directly
> > > > > > > > > > > > > > > > > > > > > > access GPUManager, it should get
> what
> > > > > they
> > > > > > want
> > > > > > > > > > from
> > > > > > > > > > > > > > Context.
> > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > > > then decouple the
> > > > > interface/implementation
> > > > > > of
> > > > > > > > > > > > GPUManager
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > Public
> > > > > > > > > > > > > > > > > > > > > > API.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM
> > > Stephan
> > > > > > Ewen <
> > > > > > > > > > > > > > > sewen@apache.org
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > It sounds fine to initially
> start
> > > with
> > > > > > GPU
> > > > > > > > > > specific
> > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > > > generalizing this once we
> better
> > > > > > understand
> > > > > > > > the
> > > > > > > > > > > > space.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > About the implementation
> suggested
> > > in
> > > > > > > > FLIP-108:
> > > > > > > > > > > > > > > > > > > > > > > - Can we somehow keep this out
> of
> > > the
> > > > > > > > TaskManager
> > > > > > > > > > > > > > services?
> > > > > > > > > > > > > > > > > > > Anything
> > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > have to pull through all
> layers of
> > > the
> > > > > TM
> > > > > > > > makes
> > > > > > > > > > the
> > > > > > > > > > > > TM
> > > > > > > > > > > > > > > > > components
> > > > > > > > > > > > > > > > > > > yet
> > > > > > > > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > - What parts need information
> about
> > > > > this?
> > > > > > > > > > > > > > > > > > > > > > > -> do the slot profiles need
> > > > > information
> > > > > > > > about
> > > > > > > > > > the
> > > > > > > > > > > > GPU?
> > > > > > > > > > > > > > > > > > > > > > > -> Can the GPU Manager be a
> "self
> > > > > > contained"
> > > > > > > > > > thing
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > > > > > > the configuration, and then
> > > abstracts
> > > > > > > > everything
> > > > > > > > > > > > > > > internally?
> > > > > > > > > > > > > > > > > > > > Operators
> > > > > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > > access it via
> "GPUManager.get()"
> > > or so?
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM
> > > Yangze
> > > > > > Guo <
> > > > > > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > > > > > > > > > Regarding the WebUI and
> GPUInfo,
> > > > > you're
> > > > > > > > right,
> > > > > > > > > > > > I'll add
> > > > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > > > > > > > > > > Regarding the general
> extended
> > > > > resource
> > > > > > > > > > > mechanism,
> > > > > > > > > > > > I
> > > > > > > > > > > > > > > second
> > > > > > > > > > > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > > > > > > > > > > - It's better to leverage
> > > > > > ResourceProfile
> > > > > > > > and
> > > > > > > > > > > > > > > ResourceSpec
> > > > > > > > > > > > > > > > > > after
> > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > > supporting fine-grained GPU
> > > > > > scheduling. As
> > > > > > > > a
> > > > > > > > > > > first
> > > > > > > > > > > > step
> > > > > > > > > > > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > > > > > > > > > > prefer to not include it in
> the
> > > scope
> > > > > > of
> > > > > > > > this
> > > > > > > > > > > FLIP.
> > > > > > > > > > > > > > > > > > > > > > > > - Regarding the "Extended
> > > Resource
> > > > > > > > Manager",
> > > > > > > > > > if I
> > > > > > > > > > > > > > > > understand
> > > > > > > > > > > > > > > > > > > > > > > > correctly, it just a code
> > > refactoring
> > > > > > atm,
> > > > > > > > we
> > > > > > > > > > > could
> > > > > > > > > > > > > > > extract
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >
> > > open/close/allocateExtendResources of
> > > > > > > > > > GPUManager
> > > > > > > > > > > to
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > > > > > > > > > > that is the case, +1 to do it
> > > during
> > > > > > > > > > > > implementation.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > > > > > > > > > > As Xintong said, we looked
> into
> > > how
> > > > > > Spark
> > > > > > > > > > > supports
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > > > > Resource Scheduling" before
> and
> > > > > > decided to
> > > > > > > > > > > > introduce a
> > > > > > > > > > > > > > > > common
> > > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > >
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > > > > > > > > > > to make it more extensible. I
> > > think
> > > > > the
> > > > > > > > > > > "resource"
> > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > proper
> > > > > > > > > > > > > > > > > > > > level
> > > > > > > > > > > > > > > > > > > > > > > > to contain all the configs of
> > > > > extended
> > > > > > > > > > resources.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48
> AM
> > > > > Xingbo
> > > > > > > > Huang <
> > > > > > > > > > > > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks a lot for the FLIP,
> > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > There is no doubt that GPU
> > > resource
> > > > > > > > > > management
> > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > > > > > > > > > > facilitate the development
> of
> > > > > > AI-related
> > > > > > > > > > > > applications
> > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > PyFlink
> > > > > > > > > > > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > I have only one comment
> about
> > > this
> > > > > > wiki:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Regarding the names of
> several
> > > GPU
> > > > > > > > > > > > configurations, I
> > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > delete the resource field
> > > makes it
> > > > > > > > consistent
> > > > > > > > > > > > with
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > names
> > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > > > > resource-related
> > > configurations in
> > > > > > > > > > > > TaskManagerOption.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > taskmanager.resource.gpu.discovery-script.path
> > > > > > > > > > > > > > ->
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song <
> > > > > tonysong820@gmail.com>
> > > > > > > > > > > > 于2020年3月4日周三
> > > > > > > > > > > > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Actually, Yangze, Yang
> and I
> > > also
> > > > > > had
> > > > > > > > an
> > > > > > > > > > > > offline
> > > > > > > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > > > > > > > > > > the "GPU Support" as some
> > > general
> > > > > > > > "Extended
> > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > Support".
> > > > > > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > > > > > > > > > > supporting extended
> > > resources in
> > > > > a
> > > > > > > > general
> > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > > > > > > and extensible way. The
> > > reason we
> > > > > > > > propose
> > > > > > > > > > > this
> > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > narrowing
> > > > > > > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > > > > > > > > > down to GPU alone, is
> mainly
> > > for
> > > > > > the
> > > > > > > > > > concern
> > > > > > > > > > > on
> > > > > > > > > > > > > > extra
> > > > > > > > > > > > > > > > > > efforts
> > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > > > > > > > > > > capacity needed for a
> general
> > > > > > > > mechanism.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > To come up with a well
> > > design on
> > > > > a
> > > > > > > > general
> > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > > > > > > mechanism, we would need
> to
> > > > > > investigate
> > > > > > > > > > more
> > > > > > > > > > > > on how
> > > > > > > > > > > > > > > > > people
> > > > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > > > > > > > kind of resources in
> > > practice.
> > > > > For
> > > > > > > > GPU, we
> > > > > > > > > > > > learnt
> > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > knowledge
> > > > > > > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > experts, Becket and his
> team
> > > > > > members.
> > > > > > > > But
> > > > > > > > > > for
> > > > > > > > > > > > FPGA,
> > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > > > > > > > > > extended resources, we
> don't
> > > have
> > > > > > such
> > > > > > > > > > > > convenient
> > > > > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > > > > > > > > > > making the investigation
> > > requires
> > > > > > more
> > > > > > > > > > > efforts,
> > > > > > > > > > > > > > > which I
> > > > > > > > > > > > > > > > > > tend
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On the other hand, we
> also
> > > looked
> > > > > > into
> > > > > > > > how
> > > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > > supports a
> > > > > > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > > > > > > Resource Scheduling".
> > > Assuming we
> > > > > > want
> > > > > > > > to
> > > > > > > > > > > have
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > > > > > resource mechanism in the
> > > future,
> > > > > > we
> > > > > > > > > > believe
> > > > > > > > > > > > that
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > current
> > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > > > > > > > design can be easily
> > > extended, in
> > > > > > an
> > > > > > > > > > > > incremental
> > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > - The most important
> part is
> > > > > > probably
> > > > > > > > user
> > > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > > > > > > > > > > configuration options to
> > > define
> > > > > the
> > > > > > > > amount,
> > > > > > > > > > > > > > discovery
> > > > > > > > > > > > > > > > > > script
> > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > > > > > > > > > > k8s) in a per resource
> type
> > > bias
> > > > > > [1],
> > > > > > > > which
> > > > > > > > > > > is
> > > > > > > > > > > > very
> > > > > > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > > > > proposed in this FLIP. I
> > > think
> > > > > > it's not
> > > > > > > > > > > > necessary
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > > > > > > in the general way atm,
> > > since we
> > > > > > do not
> > > > > > > > > > have
> > > > > > > > > > > > > > supports
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > > > > types now. If later we
> > > decided to
> > > > > > have
> > > > > > > > per
> > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > > > > > > > > > > can have backwards
> > > compatibility
> > > > > > on the
> > > > > > > > > > > current
> > > > > > > > > > > > > > > > proposed
> > > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > > > > > > > > > > - For the GPU Manager, if
> > > later
> > > > > > needed
> > > > > > > > we
> > > > > > > > > > can
> > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > > > > > > > > > > Resource Manager" (or
> > > whatever it
> > > > > > is
> > > > > > > > > > called).
> > > > > > > > > > > > That
> > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > > > > > > > > > > component-internal
> > > refactoring.
> > > > > > > > > > > > > > > > > > > > > > > > > > - For ResourceProfile and
> > > > > > ResourceSpec,
> > > > > > > > > > there
> > > > > > > > > > > > are
> > > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > > > > > > > > > > general extended
> resource.
> > > We can
> > > > > > of
> > > > > > > > course
> > > > > > > > > > > > > > leverage
> > > > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > > > > > > > > > > fine grained GPU
> scheduling.
> > > That
> > > > > > is
> > > > > > > > also
> > > > > > > > > > not
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > > > > > > > > > step proposal, and would
> > > require
> > > > > > > > FLIP-56 to
> > > > > > > > > > > be
> > > > > > > > > > > > > > > finished
> > > > > > > > > > > > > > > > > > > first.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > To summary up, I agree
> with
> > > > > Becket
> > > > > > that
> > > > > > > > > > have
> > > > > > > > > > > a
> > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > general extended resource
> > > > > > mechanism,
> > > > > > > > and
> > > > > > > > > > keep
> > > > > > > > > > > > it in
> > > > > > > > > > > > > > > > mind
> > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > > > > > > > > > > and implementing the
> current
> > > one.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at
> 9:18
> > > AM
> > > > > > Becket
> > > > > > > > Qin <
> > > > > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > That's a good point,
> > > Stephan.
> > > > > It
> > > > > > > > makes
> > > > > > > > > > > total
> > > > > > > > > > > > > > sense
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > generalize
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > > resource management to
> > > support
> > > > > > custom
> > > > > > > > > > > > resources.
> > > > > > > > > > > > > > > > Having
> > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > > > > > > > > to add new resources by
> > > > > > themselves.
> > > > > > > > The
> > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > > > > involve two different
> > > aspects:
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > 1. The custom resource
> type
> > > > > > > > definition.
> > > > > > > > > > It
> > > > > > > > > > > is
> > > > > > > > > > > > > > > > supported
> > > > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > > > > > > resources in
> > > ResourceProfile
> > > > > and
> > > > > > > > > > > > ResourceSpec.
> > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > 2. The custom resource
> > > > > allocation
> > > > > > > > logic,
> > > > > > > > > > > > i.e. how
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > assign
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > > > > > > > > > to different tasks,
> > > operators,
> > > > > > and
> > > > > > > > so on.
> > > > > > > > > > > > This
> > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > require
> > > > > > > > > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > > > > > > > > > > a. Subtask level - make
> > > sure
> > > > > the
> > > > > > > > subtasks
> > > > > > > > > > > > are put
> > > > > > > > > > > > > > > > into
> > > > > > > > > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > > > > > > > > > > It is done by the
> global
> > > RM and
> > > > > > is
> > > > > > > > not
> > > > > > > > > > > > > > customizable
> > > > > > > > > > > > > > > > > right
> > > > > > > > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > > > > > > > b. Operator level -
> map the
> > > > > exact
> > > > > > > > > > resource
> > > > > > > > > > > > to the
> > > > > > > > > > > > > > > > > > operators
> > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > > > > > > > > > > GPU 1 for operator A,
> GPU
> > > 2 for
> > > > > > > > operator
> > > > > > > > > > B.
> > > > > > > > > > > > This
> > > > > > > > > > > > > > > step
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > needed
> > > > > > > > > > > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > > > > > > > > > > the global RM does not
> > > > > > distinguish
> > > > > > > > > > > individual
> > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > > > > > > > > > > It is true for memory,
> but
> > > not
> > > > > > for
> > > > > > > > GPU.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > The GPU manager is
> > > designed to
> > > > > > do 2.b
> > > > > > > > > > here.
> > > > > > > > > > > > So it
> > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > > > > > > > > > > physical GPU
> information
> > > and
> > > > > > > > bind/match
> > > > > > > > > > > them
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > > > > > general will fill in
> the
> > > > > missing
> > > > > > > > piece to
> > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > custom
> > > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > > > > > > definition. But I'd
> avoid
> > > > > > calling it
> > > > > > > > a
> > > > > > > > > > > > "External
> > > > > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > > > > > > > > > confusion with RM,
> maybe
> > > > > > something
> > > > > > > > like
> > > > > > > > > > > > "Operator
> > > > > > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > > > > be more accurate. So
> for
> > > each
> > > > > > > > resource
> > > > > > > > > > type
> > > > > > > > > > > > users
> > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > > > > > > > > > > "Operator Resource
> > > Assigner" in
> > > > > > the
> > > > > > > > TM.
> > > > > > > > > > For
> > > > > > > > > > > > > > memory,
> > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > > > > > > > > > > but for other extended
> > > > > resources,
> > > > > > > > users
> > > > > > > > > > may
> > > > > > > > > > > > need
> > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Personally I think a
> > > pluggable
> > > > > > > > "Operator
> > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > > > > > > > > > > in this FLIP. But I am
> > > also OK
> > > > > > with
> > > > > > > > > > having
> > > > > > > > > > > > that
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > > > > > > > the interface between
> the
> > > > > > "Operator
> > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > > > > take a while to settle
> > > down if
> > > > > we
> > > > > > > > want to
> > > > > > > > > > > > make it
> > > > > > > > > > > > > > > > > > generic.
> > > > > > > > > > > > > > > > > > > > But I
> > > > > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > > > > > > > > > implementation should
> take
> > > this
> > > > > > > > future
> > > > > > > > > > work
> > > > > > > > > > > > into
> > > > > > > > > > > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > > > > > > > > > > don't need to break
> > > backwards
> > > > > > > > > > compatibility
> > > > > > > > > > > > once
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at
> > > 12:27 AM
> > > > > > > > Stephan
> > > > > > > > > > > Ewen
> > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for writing
> > > this
> > > > > > FLIP.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > I cannot really give
> much
> > > > > input
> > > > > > > > into
> > > > > > > > > > the
> > > > > > > > > > > > > > > mechanics
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > > > > > > > > > > and GPU allocation,
> as I
> > > have
> > > > > > no
> > > > > > > > > > > experience
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > One thought I had
> when
> > > > > reading
> > > > > > the
> > > > > > > > > > > > proposal is
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > > > > > > > > > > the "GPU Manager" as
> an
> > > > > > "External
> > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > Manager",
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > The way I understand
> the
> > > > > > > > > > ResourceProfile
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > It has the advantage
> > > that it
> > > > > > looks
> > > > > > > > more
> > > > > > > > > > > > > > > extensible.
> > > > > > > > > > > > > > > > > > Maybe
> > > > > > > > > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Resource, a
> specialized
> > > > > NVIDIA
> > > > > > GPU
> > > > > > > > > > > > Resource,
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > FPGA
> > > > > > > > > > > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020
> at
> > > 7:57
> > > > > AM
> > > > > > > > Becket
> > > > > > > > > > > Qin <
> > > > > > > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP
> > > Yangze.
> > > > > > GPU
> > > > > > > > > > > resource
> > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > for machine
> learning
> > > use
> > > > > > cases.
> > > > > > > > > > > Actually
> > > > > > > > > > > > it
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > question from the
> > > users who
> > > > > > are
> > > > > > > > > > > > interested in
> > > > > > > > > > > > > > > > using
> > > > > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Some quick
> comments /
> > > > > > questions
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. The WebUI /
> REST API
> > > > > > should
> > > > > > > > > > probably
> > > > > > > > > > > > also
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > mentioned in
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Is the data
> > > structure
> > > > > that
> > > > > > > > holds
> > > > > > > > > > GPU
> > > > > > > > > > > > info
> > > > > > > > > > > > > > > > also a
> > > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket)
> Qin
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3,
> 2020 at
> > > > > 10:15
> > > > > > AM
> > > > > > > > > > Xintong
> > > > > > > > > > > > Song
> > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for
> drafting
> > > the
> > > > > > FLIP
> > > > > > > > and
> > > > > > > > > > > > kicking
> > > > > > > > > > > > > > off
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Big +1 for this
> > > feature.
> > > > > > > > Supporting
> > > > > > > > > > > > using
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > especially for
> the ML
> > > > > > > > scenarios.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've reviewed the
> > > FLIP
> > > > > wiki
> > > > > > > > doc and
> > > > > > > > > > > it
> > > > > > > > > > > > > > looks
> > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > me. I
> > > > > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > very good first
> step
> > > for
> > > > > > > > Flink's
> > > > > > > > > > GPU
> > > > > > > > > > > > > > > supports.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2,
> 2020
> > > at
> > > > > > 12:06 PM
> > > > > > > > > > > Yangze
> > > > > > > > > > > > Guo
> > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We would like
> to
> > > start
> > > > > a
> > > > > > > > > > discussion
> > > > > > > > > > > > > > thread
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > support in
> > > Flink"[1].
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This FLIP
> mainly
> > > > > > discusses
> > > > > > > > the
> > > > > > > > > > > > following
> > > > > > > > > > > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Enable user
> to
> > > > > > configure
> > > > > > > > how
> > > > > > > > > > many
> > > > > > > > > > > > GPUs
> > > > > > > > > > > > > > > in a
> > > > > > > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > forward such
> > > > > > requirements to
> > > > > > > > the
> > > > > > > > > > > > external
> > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > Kubernetes/Yarn/Mesos
> > > > > > > > setups).
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Provide
> > > information
> > > > > of
> > > > > > > > > > available
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Key changes
> > > proposed in
> > > > > > the
> > > > > > > > FLIP
> > > > > > > > > > > are
> > > > > > > > > > > > as
> > > > > > > > > > > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU
> > > resource
> > > > > > > > > > requirements
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce
> > > GPUManager
> > > > > as
> > > > > > > > one of
> > > > > > > > > > > the
> > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > manager
> > > > > > > > > > > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and expose GPU
> > > resource
> > > > > > > > > > information
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > context
> > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce the
> > > default
> > > > > > > > script
> > > > > > > > > > for
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > discovery,
> > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the privilege
> mode
> > > to
> > > > > > help
> > > > > > > > user
> > > > > > > > > > to
> > > > > > > > > > > > > > achieve
> > > > > > > > > > > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > standalone
> mode.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please find
> more
> > > > > details
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > FLIP
> > > > > > > > > > > > wiki
> > > > > > > > > > > > > > > > > > document
> > > > > > > > > > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Yangze Guo <ka...@gmail.com>.
Hi everyone,
I've updated the FLIP accordingly. The key change is replacing two
resource allocation interfaces to config options.

If there are no further comments, I would like to start a voting
thread by tomorrow.

Best,
Yangze Guo

On Mon, Mar 30, 2020 at 9:15 PM Till Rohrmann <tr...@apache.org> wrote:
>
> If there is no need for the ExternalResourceDriver on the RM side, then it
> is always a good idea to keep it simple and don't introduce it. One can
> always change things once one realizes that there is a need for it.
>
> Cheers,
> Till
>
> On Mon, Mar 30, 2020 at 12:00 PM Yangze Guo <ka...@gmail.com> wrote:
>
> > Hi @Till, @Xintong
> >
> > I think even without the credential concerns, replacing the interfaces
> > with configuration options is a good idea from my side.
> > - Currently, I don't see any external resource does not compatible
> > with this mechanism
> > - It reduces the burden of users to implement a plugin themselves.
> > WDYT?
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Mar 30, 2020 at 5:44 PM Xintong Song <to...@gmail.com>
> > wrote:
> > >
> > > I also agree that the pluggable ExternalResourceDriver should be loaded
> > by
> > > the cluster class loader. Despite the plugin might be implemented by
> > users,
> > > external resources (as part of task executor resources) should be cluster
> > > configurations, unlike job-level user codes such as UDFs, because the
> > task
> > > executors belongs to the cluster rather than jobs.
> > >
> > >
> > > IIUC, the concern Stephan raised is about the potential credential
> > problem
> > > when executing user codes on RM with cluster class loader. The concern
> > > makes sense to me, and I think what Yangze suggested should be a good
> > > approach trying to prevent such credential problems. The only purpose we
> > > tried to execute user codes (i.e. getKubernetes/YarnExternalResource) on
> > RM
> > > was that, we need to set these key-value pairs to pod/container requests.
> > > Replacing the interfaces getKubernetes/YarnExternalResource with
> > > configuration options
> > > 'external-resource.{resourceName}.yarn/kubernetes.key/amount',
> > > we can still fulfill that purpose, without the credential risks.
> > >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Mon, Mar 30, 2020 at 5:17 PM Till Rohrmann <tr...@apache.org>
> > wrote:
> > >
> > > > At the moment the RM does not have a user code class loader and I agree
> > > > with Stephan that it should stay like this. This, however, does not
> > mean
> > > > that we cannot support pluggable components in the RM. As long as the
> > > > plugins are on the system's class path, it should be fine for the RM to
> > > > load them. For example, we could add external resources via Flink's
> > plugin
> > > > mechanism or something similar.
> > > >
> > > > A very simple implementation of such an ExternalResourceDriver could
> > be a
> > > > class which simply returns what is written in the flink-conf.yaml
> > under a
> > > > given key.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo <ka...@gmail.com> wrote:
> > > >
> > > > > Hi, Stephan,
> > > > >
> > > > > I see your concern and I totally agree with you.
> > > > >
> > > > > The interface on RM side is now `Map<String key, String/Long value>
> > > > > getYarn/KubernetesExternalResource()`. The only valid information RM
> > > > > get from it is the configuration key of that external resource in
> > > > > Yarn/K8s. The "String/Long value" would be the same as the
> > > > > external-resource.{resourceName}.amount.
> > > > > So, I think it makes sense to replace these two interfaces with two
> > > > > configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key.
> > We
> > > > > may lose some extensibility, but AFAIK it could work with common
> > > > > external resources like GPU, FPGA. WDYT?
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen <se...@apache.org>
> > wrote:
> > > > > >
> > > > > > Maybe one final comment: It is probably not an issue, but let's
> > try and
> > > > > > keep user code (via user code classloader) out of the
> > ResourceManager,
> > > > if
> > > > > > possible.
> > > > > >
> > > > > > As background:
> > > > > >
> > > > > > There were thoughts in the past to support setups where the RM
> > must run
> > > > > > with "superuser" credentials, but we cannot run JM/TM with these
> > > > > > credentials, as the user code might access them otherwise.
> > > > > > This is actually possible today, you can run the RM in a different
> > JVM
> > > > or
> > > > > > in a different container, and give it more credentials than JMs /
> > TMs.
> > > > > But
> > > > > > for this to be feasible, we cannot allow any user-defined code to
> > be in
> > > > > the
> > > > > > JVM, because that instantaneously breaks the isolation of
> > credentials.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo <ka...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Thanks for the feedback, @Till and @Xintong.
> > > > > > >
> > > > > > > Regarding separating the interface, I'm also +1 with it.
> > > > > > >
> > > > > > > Regarding the resource allocation interface, true, it's
> > dangerous to
> > > > > > > give much access to user codes. Changing the return type to
> > > > Map<String
> > > > > > > key, String/Long value> makes sense to me. AFAIK, it is
> > compatible
> > > > > > > with all the first-party supported resources for
> > Yarn/Kubernetes. It
> > > > > > > could also free us from the potential dependency issue as well.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song <
> > tonysong820@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks for updating the FLIP, Yangze.
> > > > > > > >
> > > > > > > > I agree with Till that we probably want to separate the
> > K8s/Yarn
> > > > > > > decorator
> > > > > > > > calls. Users can still configure one driver class, and we can
> > use
> > > > > > > > `instanceof` to check whether the driver implemented K8s/Yarn
> > > > > specific
> > > > > > > > interfaces.
> > > > > > > >
> > > > > > > > Moreover, I'm not sure about exposing entire
> > `ContainerRequest` /
> > > > > `Pod`
> > > > > > > > (`AbstractKubernetesStepDecorator` directly manipulates on
> > `Pod`)
> > > > to
> > > > > user
> > > > > > > > codes. It gives more access to user codes than needed for
> > defining
> > > > > > > external
> > > > > > > > resource, which might cause problems. Instead, I would suggest
> > to
> > > > > have
> > > > > > > > interface like `Map<String key, String value>
> > > > > > > > getYarn/KubernetesExternalResource()` and assemble them into
> > > > > > > > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <
> > > > trohrmann@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > I'm a bit late to the party. I think the current proposal
> > looks
> > > > > good.
> > > > > > > > >
> > > > > > > > > Concerning the ExternalResourceDriver interface defined in
> > the
> > > > FLIP
> > > > > > > [1], I
> > > > > > > > > would suggest to not include the decorator calls for
> > Kubernetes
> > > > and
> > > > > > > Yarn in
> > > > > > > > > the base interface. Instead I would suggest to segregate the
> > > > > deployment
> > > > > > > > > specific decorator calls into separate interfaces. That way
> > an
> > > > > > > > > ExternalResourceDriver does not have to support all
> > deployments
> > > > > from
> > > > > > > the
> > > > > > > > > very beginning. Moreover, some resources might not be
> > supported
> > > > by
> > > > > a
> > > > > > > > > specific deployment target and the natural way to express
> > this
> > > > > would
> > > > > > > be to
> > > > > > > > > not implement the respective deployment specific interface.
> > > > > > > > >
> > > > > > > > > Moreover, having void
> > > > > > > > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> > > > > > > containerRequest)
> > > > > > > > > in the ExternalResourceDriver interface would require Hadoop
> > on
> > > > > Flink's
> > > > > > > > > classpath whenever the external resource driver is being
> > used.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <
> > sewen@apache.org>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Nice, thanks a lot!
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <
> > > > karmagyz@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the suggestion, @Stephan, @Becket and
> > @Xintong.
> > > > > > > > > > >
> > > > > > > > > > > I've updated the FLIP accordingly. I do not add a
> > > > > > > > > > > ResourceInfoProvider. Instead, I introduce the
> > > > > > > ExternalResourceDriver,
> > > > > > > > > > > which takes the responsibility of all relevant
> > operations on
> > > > > both
> > > > > > > RM
> > > > > > > > > > > and TM sides.
> > > > > > > > > > > After a rethink about decoupling the management of
> > external
> > > > > > > resources
> > > > > > > > > > > from TaskExecutor, I think we could do the same thing on
> > the
> > > > > > > > > > > ResourceManager side. We do not need to add a specific
> > > > > allocation
> > > > > > > > > > > logic to the ResourceManager each time we add a specific
> > > > > external
> > > > > > > > > > > resource.
> > > > > > > > > > > - For Yarn, we need the ExternalResourceDriver to edit
> > the
> > > > > > > > > > > containerRequest.
> > > > > > > > > > > - For Kubenetes, ExternalResourceDriver could provide a
> > > > > decorator
> > > > > > > for
> > > > > > > > > > > the TM pod.
> > > > > > > > > > >
> > > > > > > > > > > In this way, just like MetricReporter, we allow users to
> > > > define
> > > > > > > their
> > > > > > > > > > > custom ExternalResourceDriver. It is more extensible and
> > fits
> > > > > the
> > > > > > > > > > > separation of concerns. For more details, please take a
> > look
> > > > at
> > > > > > > [1].
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Yangze Guo
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <
> > > > sewen@apache.org
> > > > > >
> > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > This sounds good to go ahead from my side.
> > > > > > > > > > > >
> > > > > > > > > > > > I like the approach that Becket suggested - in that
> > case
> > > > the
> > > > > core
> > > > > > > > > > > > abstraction that everyone would need to understand
> > would be
> > > > > > > "external
> > > > > > > > > > > > resource allocation" and the "ResourceInfoProvider",
> > and
> > > > the
> > > > > GPU
> > > > > > > > > > specific
> > > > > > > > > > > > code would be a specific implementation only known to
> > that
> > > > > > > component
> > > > > > > > > > that
> > > > > > > > > > > > allocates the external resource. That fits the
> > separation
> > > > of
> > > > > > > concerns
> > > > > > > > > > > well.
> > > > > > > > > > > >
> > > > > > > > > > > > I also understand that it should not be
> > over-engineered in
> > > > > the
> > > > > > > first
> > > > > > > > > > > > version, so some simplification makes sense, and then
> > > > > gradually
> > > > > > > > > expand
> > > > > > > > > > > from
> > > > > > > > > > > > there.
> > > > > > > > > > > >
> > > > > > > > > > > > So +1 to go ahead with what was suggested above
> > (Xintong /
> > > > > > > Becket)
> > > > > > > > > from
> > > > > > > > > > > my
> > > > > > > > > > > > side.
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <
> > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the comments, Stephan & Becket.
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Stephan
> > > > > > > > > > > > >
> > > > > > > > > > > > > I see your concern, and I completely agree with you
> > that
> > > > we
> > > > > > > should
> > > > > > > > > > > first
> > > > > > > > > > > > > think about the "library" / "plugin" / "extension"
> > style
> > > > if
> > > > > > > > > possible.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If GPUs are sliced and assigned during scheduling,
> > there
> > > > > may be
> > > > > > > > > > reason,
> > > > > > > > > > > > > > although it looks that it would belong to the slot
> > > > then.
> > > > > Is
> > > > > > > that
> > > > > > > > > > > what we
> > > > > > > > > > > > > > are doing here?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > In the current proposal, we do not have the GPUs
> > sliced
> > > > and
> > > > > > > > > assigned
> > > > > > > > > > to
> > > > > > > > > > > > > slots, because it could be problematic without
> > dynamic
> > > > slot
> > > > > > > > > > allocation.
> > > > > > > > > > > > > E.g., the number of GPUs might not be evenly
> > divisible by
> > > > > the
> > > > > > > > > number
> > > > > > > > > > of
> > > > > > > > > > > > > slots.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think it makes sense to eventually have the GPUs
> > > > > assigned to
> > > > > > > > > slots.
> > > > > > > > > > > Even
> > > > > > > > > > > > > then, we might still need a TM level GPUManager (or
> > > > > > > > > ResourceProvider
> > > > > > > > > > > like
> > > > > > > > > > > > > Becket suggested). For memory, in each slot we can
> > simply
> > > > > > > request
> > > > > > > > > the
> > > > > > > > > > > > > amount of memory, leaving it to JVM / OS to decide
> > which
> > > > > memory
> > > > > > > > > > > (address)
> > > > > > > > > > > > > should be assigned. For GPU, and potentially other
> > > > > resources
> > > > > > > like
> > > > > > > > > > > FPGA, we
> > > > > > > > > > > > > need to explicitly specify which GPU (index) should
> > be
> > > > > used.
> > > > > > > > > > > Therefore, we
> > > > > > > > > > > > > need some component at the TM level to coordinate
> > which
> > > > > slot
> > > > > > > uses
> > > > > > > > > > which
> > > > > > > > > > > > > GPU.
> > > > > > > > > > > > >
> > > > > > > > > > > > > IMO, unless we say Flink will not support slot-level
> > GPU
> > > > > > > slicing at
> > > > > > > > > > > least
> > > > > > > > > > > > > in the foreseeable future, I don't see a good way to
> > > > avoid
> > > > > > > touching
> > > > > > > > > > > the TM
> > > > > > > > > > > > > core. To that end, I think Becket's suggestion
> > points to
> > > > a
> > > > > good
> > > > > > > > > > > direction,
> > > > > > > > > > > > > that supports more features (GPU, FPGA, etc.) with
> > less
> > > > > > > coupling to
> > > > > > > > > > > the TM
> > > > > > > > > > > > > core (only needs to understand the general
> > interfaces).
> > > > The
> > > > > > > > > detailed
> > > > > > > > > > > > > implementation for specific resource types can even
> > be
> > > > > > > encapsulated
> > > > > > > > > > as
> > > > > > > > > > > a
> > > > > > > > > > > > > library.
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Becket
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for sharing your thought on the final state.
> > > > > Despite the
> > > > > > > > > > > details how
> > > > > > > > > > > > > the interfaces should look like, I think this is a
> > really
> > > > > good
> > > > > > > > > > > abstraction
> > > > > > > > > > > > > for supporting general resource types.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'd like to further clarify that, the following three
> > > > > things
> > > > > > > are
> > > > > > > > > all
> > > > > > > > > > > that
> > > > > > > > > > > > > the "Flink core" needs to understand.
> > > > > > > > > > > > >
> > > > > > > > > > > > >    - The *amount* of resource, for scheduling.
> > Actually,
> > > > we
> > > > > > > already
> > > > > > > > > > > have
> > > > > > > > > > > > >    the Resource class in ResourceProfile and
> > ResourceSpec
> > > > > for
> > > > > > > > > > extended
> > > > > > > > > > > > >    resource. It's just not really used.
> > > > > > > > > > > > >    - The *info*, that Flink provides to the
> > operators /
> > > > > user
> > > > > > > codes.
> > > > > > > > > > > > >    - The *provider*, which generates the info based
> > on
> > > > the
> > > > > > > amount.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The "core" does not need to understand the specific
> > > > > > > implementation
> > > > > > > > > > > details
> > > > > > > > > > > > > of the above three. They can even be implemented in a
> > > > > 3rd-party
> > > > > > > > > > > library.
> > > > > > > > > > > > > Similar to how we allow users to define their custom
> > > > > > > > > MetricReporter.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <
> > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the comment, Stephan.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >   - If everything becomes a "core feature", it will
> > > > make
> > > > > the
> > > > > > > > > > project
> > > > > > > > > > > hard
> > > > > > > > > > > > > > > to develop in the future. Thinking "library" /
> > > > > "plugin" /
> > > > > > > > > > > "extension"
> > > > > > > > > > > > > > style
> > > > > > > > > > > > > > > where possible helps.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Completely agree. It is much more important to
> > design a
> > > > > > > mechanism
> > > > > > > > > > > than
> > > > > > > > > > > > > > focusing on a specific case. Here is what I am
> > thinking
> > > > > to
> > > > > > > fully
> > > > > > > > > > > support
> > > > > > > > > > > > > > custom resource management:
> > > > > > > > > > > > > > 1. On the JM / RM side, use ResourceProfile and
> > > > > ResourceSpec
> > > > > > > to
> > > > > > > > > > > define
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > resource and the amount required. They will be
> > used to
> > > > > find
> > > > > > > > > > suitable
> > > > > > > > > > > TMs
> > > > > > > > > > > > > > slots to run the tasks. At this point, the
> > resources
> > > > are
> > > > > only
> > > > > > > > > > > measured by
> > > > > > > > > > > > > > amount, i.e. they do not have individual ID.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. On the TM side, have something like
> > > > > > > *"ResourceInfoProvider"*
> > > > > > > > > to
> > > > > > > > > > > > > identify
> > > > > > > > > > > > > > and provides the detail information of the
> > individual
> > > > > > > resource,
> > > > > > > > > > e.g.
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > ID.. It is important because the operator may have
> > to
> > > > > > > explicitly
> > > > > > > > > > > interact
> > > > > > > > > > > > > > with the physical resource it uses. The
> > > > > ResourceInfoProvider
> > > > > > > > > might
> > > > > > > > > > > look
> > > > > > > > > > > > > > like something below.
> > > > > > > > > > > > > > interface ResourceInfoProvider<INFO> {
> > > > > > > > > > > > > >     Map<AbstractID, INFO>
> > > > retrieveResourceInfo(OperatorId
> > > > > > > opId,
> > > > > > > > > > > > > > ResourceProfile resourceProfile);
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - There could be several "*ResourceInfoProvider*"
> > > > > configured
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > > TM to
> > > > > > > > > > > > > > retrieve the information for different resources.
> > > > > > > > > > > > > > - The TM will be responsible to assign those
> > individual
> > > > > > > resources
> > > > > > > > > > to
> > > > > > > > > > > each
> > > > > > > > > > > > > > operator according to their requested amount.
> > > > > > > > > > > > > > - The operators will be able to get the
> > ResourceInfo
> > > > from
> > > > > > > their
> > > > > > > > > > > > > > RuntimeContext.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If we agree this is a reasonable final state. We
> > can
> > > > > adapt
> > > > > > > the
> > > > > > > > > > > current
> > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > to it. In fact it does not sound a big change to
> > me.
> > > > All
> > > > > the
> > > > > > > > > > proposed
> > > > > > > > > > > > > > configuration can be as is, it is just that Flink
> > > > itself
> > > > > > > won't
> > > > > > > > > care
> > > > > > > > > > > about
> > > > > > > > > > > > > > them, instead a GPUInfoProviver implementing the
> > > > > > > > > > ResourceInfoProvider
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > use them.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <
> > > > > > > sewen@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi all!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The main point I wanted to throw into the
> > discussion
> > > > > is the
> > > > > > > > > > > following:
> > > > > > > > > > > > > > >   - With more and more use cases, more and more
> > tools
> > > > > go
> > > > > > > into
> > > > > > > > > > Flink
> > > > > > > > > > > > > > >   - If everything becomes a "core feature", it
> > will
> > > > > make
> > > > > > > the
> > > > > > > > > > > project
> > > > > > > > > > > > > hard
> > > > > > > > > > > > > > > to develop in the future. Thinking "library" /
> > > > > "plugin" /
> > > > > > > > > > > "extension"
> > > > > > > > > > > > > > style
> > > > > > > > > > > > > > > where possible helps.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   - A good thought experiment is always: How many
> > > > > future
> > > > > > > > > > developers
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > interact with this code (and possibly understand
> > it
> > > > > > > partially),
> > > > > > > > > > > even if
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > features they touch have nothing to do with GPU
> > > > > support. If
> > > > > > > > > many
> > > > > > > > > > > > > > > contributors to unrelated features will have to
> > touch
> > > > > it
> > > > > > > and
> > > > > > > > > > > understand
> > > > > > > > > > > > > > it,
> > > > > > > > > > > > > > > then let's think if there is a different
> > solution.
> > > > > Maybe
> > > > > > > there
> > > > > > > > > is
> > > > > > > > > > > not,
> > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > then we should be sure why.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   - That led me to raising this issue: If the GPU
> > > > > manager
> > > > > > > > > > becomes a
> > > > > > > > > > > > > core
> > > > > > > > > > > > > > > service in the TaskManager, Environment,
> > > > > RuntimeContext,
> > > > > > > etc.
> > > > > > > > > > then
> > > > > > > > > > > > > > everyone
> > > > > > > > > > > > > > > developing TM and streaming tasks need to
> > understand
> > > > > the
> > > > > > > GPU
> > > > > > > > > > > manager.
> > > > > > > > > > > > > > That
> > > > > > > > > > > > > > > seems oddly specific, is my impression.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Access to configuration seems not the right
> > reason to
> > > > > do
> > > > > > > that.
> > > > > > > > > We
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > expose the Flink configuration from the
> > > > RuntimeContext
> > > > > > > anyways.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If GPUs are sliced and assigned during
> > scheduling,
> > > > > there
> > > > > > > may be
> > > > > > > > > > > reason,
> > > > > > > > > > > > > > > although it looks that it would belong to the
> > slot
> > > > > then. Is
> > > > > > > > > that
> > > > > > > > > > > what
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > are doing here?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > IMO, eventually an operator should only see
> > info of
> > > > > GPUs
> > > > > > > that
> > > > > > > > > > are
> > > > > > > > > > > > > > > dedicated
> > > > > > > > > > > > > > > > for it, instead of all GPUs on the
> > > > machine/container
> > > > > in
> > > > > > > the
> > > > > > > > > > > current
> > > > > > > > > > > > > > > design.
> > > > > > > > > > > > > > > > It does not make sense to let the user who
> > writes a
> > > > > UDF
> > > > > > > to
> > > > > > > > > > worry
> > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > coordination among multiple operators running
> > on
> > > > the
> > > > > same
> > > > > > > > > > > machine.
> > > > > > > > > > > > > And
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > we want to limit the GPU info an operator
> > sees, we
> > > > > > > should not
> > > > > > > > > > > let the
> > > > > > > > > > > > > > > > operator to instantiate GPUManager, which
> > means we
> > > > > have
> > > > > > > to
> > > > > > > > > > expose
> > > > > > > > > > > > > > > something
> > > > > > > > > > > > > > > > through runtime context, either GPU info or
> > some
> > > > > kind of
> > > > > > > > > > limited
> > > > > > > > > > > > > access
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > the GPUManager.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <
> > > > > > > > > > becket.qin@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > It probably make sense for us to first agree
> > on
> > > > the
> > > > > > > final
> > > > > > > > > > > state.
> > > > > > > > > > > > > More
> > > > > > > > > > > > > > > > > specifically, will the resource info be
> > exposed
> > > > > through
> > > > > > > > > > runtime
> > > > > > > > > > > > > > context
> > > > > > > > > > > > > > > > > eventually?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > If that is the final state and we have a
> > seamless
> > > > > > > migration
> > > > > > > > > > > story
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > FLIP to that final state, Personally I think
> > it
> > > > is
> > > > > OK
> > > > > > > to
> > > > > > > > > > > expose the
> > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > info in the runtime context.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong
> > Song <
> > > > > > > > > > > > > tonysong820@gmail.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > @Yangze,
> > > > > > > > > > > > > > > > > > I think what Stephan means (@Stephan,
> > please
> > > > > correct
> > > > > > > me
> > > > > > > > > if
> > > > > > > > > > > I'm
> > > > > > > > > > > > > > wrong)
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > that, we might not need to hold and
> > maintain
> > > > the
> > > > > > > > > GPUManager
> > > > > > > > > > > as a
> > > > > > > > > > > > > > > > service
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > TaskManagerServices or RuntimeContext. An
> > > > > > > alternative is
> > > > > > > > > to
> > > > > > > > > > > > > create
> > > > > > > > > > > > > > /
> > > > > > > > > > > > > > > > > > retrieve the GPUManager only in the
> > operators
> > > > > that
> > > > > > > need
> > > > > > > > > it,
> > > > > > > > > > > e.g.,
> > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > @Stephan,
> > > > > > > > > > > > > > > > > > I agree with you on excluding GPUManager
> > from
> > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >    - For the first step, where we provide
> > > > unified
> > > > > > > > > TM-level
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > > >    to all operators, it should be fine to
> > have
> > > > > > > operators
> > > > > > > > > > > access /
> > > > > > > > > > > > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > > > > > > > > > > > >    - In future, we might have some more
> > > > > fine-grained
> > > > > > > GPU
> > > > > > > > > > > > > > management,
> > > > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > >    we need to maintain GPUManager as a
> > service
> > > > > and
> > > > > > > put
> > > > > > > > > GPU
> > > > > > > > > > > info
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > slot
> > > > > > > > > > > > > > > > > >    profiles. But at least for now it's not
> > > > > necessary
> > > > > > > to
> > > > > > > > > > > introduce
> > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > >    complexity.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > However, I have some concerns on excluding
> > > > > GPUManager
> > > > > > > > > from
> > > > > > > > > > > > > > > > RuntimeContext
> > > > > > > > > > > > > > > > > > and let operators access it directly.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >    - Configurations needed for creating the
> > > > > > > GPUManager is
> > > > > > > > > > not
> > > > > > > > > > > > > > always
> > > > > > > > > > > > > > > > > >    available for operators.
> > > > > > > > > > > > > > > > > >    - If later we want to have fine-grained
> > > > > control
> > > > > > > over
> > > > > > > > > GPU
> > > > > > > > > > > > > (e.g.,
> > > > > > > > > > > > > > > > > >    operators in each slot can only see GPUs
> > > > > reserved
> > > > > > > for
> > > > > > > > > > that
> > > > > > > > > > > > > > slot),
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I would suggest to wrap the GPUManager
> > behind
> > > > > > > > > > RuntimeContext
> > > > > > > > > > > and
> > > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > > expose the GPUInfo to users. For now, we
> > can
> > > > > declare
> > > > > > > a
> > > > > > > > > > method
> > > > > > > > > > > > > > > > > > `getGPUInfo()` in RuntimeContext, with a
> > > > default
> > > > > > > > > definition
> > > > > > > > > > > that
> > > > > > > > > > > > > > > calls
> > > > > > > > > > > > > > > > > > `GPUManager.get()` to get the
> > lazily-created
> > > > > > > GPUManager.
> > > > > > > > > If
> > > > > > > > > > > later
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > want
> > > > > > > > > > > > > > > > > > to create / retrieve GPUManager in a
> > different
> > > > > way,
> > > > > > > we
> > > > > > > > > can
> > > > > > > > > > > simply
> > > > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > > > how `getGPUInfo` is implemented, without
> > > > needing
> > > > > to
> > > > > > > > > change
> > > > > > > > > > > any
> > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze
> > Guo <
> > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Shephan
> > > > > > > > > > > > > > > > > > > Do you mean Minicluster? Yes, it makes
> > sense
> > > > to
> > > > > > > share
> > > > > > > > > the
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > Manager
> > > > > > > > > > > > > > > > > > > in such scenario.
> > > > > > > > > > > > > > > > > > > If that's what you worry about, I'm +1
> > for
> > > > > holding
> > > > > > > > > > > > > > > > > > > GPUManager(ExternalResourceManagers) in
> > > > > > > TaskExecutor
> > > > > > > > > > > instead of
> > > > > > > > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Regarding the
> > RuntimeContext/FunctionContext,
> > > > > it
> > > > > > > just
> > > > > > > > > > > holds the
> > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > info instead of the GPU Manager. AFAIK,
> > it's
> > > > > the
> > > > > > > only
> > > > > > > > > > > place we
> > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > pass GPU info to the
> > > > > > > RichFunction/UserDefinedFunction.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac
> > > > Godfried
> > > > > <
> > > > > > > > > > > > > > > isaac@paddlesoft.net
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000
> > > > > > > > > > sewen@apache.org
> > > > > > > > > > > > > wrote
> > > > > > > > > > > > > > > > ----
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Can we somehow keep this out of the
> > > > > > > TaskManager
> > > > > > > > > > > services
> > > > > > > > > > > > > > > > > > > > > I fear that we could not. IMO, the
> > > > > > > GPUManager(or
> > > > > > > > > > > > > > > > > > > > > ExternalServicesManagers in future)
> > is
> > > > > > > conceptually
> > > > > > > > > > > one of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > > > > manager services, just like
> > MemoryManager
> > > > > > > before
> > > > > > > > > > 1.10.
> > > > > > > > > > > > > > > > > > > > > - It maintains/holds the GPU
> > resource at
> > > > TM
> > > > > > > level
> > > > > > > > > and
> > > > > > > > > > > all
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > operators allocate the GPU resources
> > from
> > > > > it.
> > > > > > > So,
> > > > > > > > > it
> > > > > > > > > > > should
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > > > > > > > > > > > - We could add a collection called
> > > > > > > > > > > ExternalResourceManagers
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > hold
> > > > > > > > > > > > > > > > > > > > > all managers of other external
> > resources
> > > > > in the
> > > > > > > > > > future.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Can you help me understand why this
> > needs
> > > > the
> > > > > > > > > addition
> > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > > > > > > > > > Are you worried about the case when
> > > > multiple
> > > > > Task
> > > > > > > > > > > Executors
> > > > > > > > > > > > > run
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > JVM? That's not common, but wouldn't it
> > > > > actually
> > > > > > > be
> > > > > > > > > > good
> > > > > > > > > > > in
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > case
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > share the GPU Manager, given that the
> > GPU
> > > > is
> > > > > > > shared?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > ---------------------------
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > What parts need information about
> > this?
> > > > > > > > > > > > > > > > > > > > > In this FLIP, operators need the
> > > > > information.
> > > > > > > Thus,
> > > > > > > > > > we
> > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > information to the
> > > > > > > RuntimeContext/FunctionContext.
> > > > > > > > > > The
> > > > > > > > > > > slot
> > > > > > > > > > > > > > > > profile
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > not aware of GPU resources as GPU is
> > TM
> > > > > level
> > > > > > > > > > resource
> > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Can the GPU Manager be a "self
> > > > contained"
> > > > > > > thing
> > > > > > > > > > that
> > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > configuration, and then abstracts
> > > > > everything
> > > > > > > > > > > internally?
> > > > > > > > > > > > > > > > > > > > > Yes, we just pass the path/args of
> > the
> > > > > discover
> > > > > > > > > > script
> > > > > > > > > > > and
> > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > > GPUs per TM to it. It takes the
> > > > > responsibility
> > > > > > > to
> > > > > > > > > get
> > > > > > > > > > > the
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > information and expose them to the
> > > > > > > > > > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > Operators. Meanwhile, we'd better not
> > > > allow
> > > > > > > > > operators
> > > > > > > > > > > to
> > > > > > > > > > > > > > > directly
> > > > > > > > > > > > > > > > > > > > > access GPUManager, it should get what
> > > > they
> > > > > want
> > > > > > > > > from
> > > > > > > > > > > > > Context.
> > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > > then decouple the
> > > > interface/implementation
> > > > > of
> > > > > > > > > > > GPUManager
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > Public
> > > > > > > > > > > > > > > > > > > > > API.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM
> > Stephan
> > > > > Ewen <
> > > > > > > > > > > > > > sewen@apache.org
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > It sounds fine to initially start
> > with
> > > > > GPU
> > > > > > > > > specific
> > > > > > > > > > > > > support
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > > generalizing this once we better
> > > > > understand
> > > > > > > the
> > > > > > > > > > > space.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > About the implementation suggested
> > in
> > > > > > > FLIP-108:
> > > > > > > > > > > > > > > > > > > > > > - Can we somehow keep this out of
> > the
> > > > > > > TaskManager
> > > > > > > > > > > > > services?
> > > > > > > > > > > > > > > > > > Anything
> > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > have to pull through all layers of
> > the
> > > > TM
> > > > > > > makes
> > > > > > > > > the
> > > > > > > > > > > TM
> > > > > > > > > > > > > > > > components
> > > > > > > > > > > > > > > > > > yet
> > > > > > > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > - What parts need information about
> > > > this?
> > > > > > > > > > > > > > > > > > > > > > -> do the slot profiles need
> > > > information
> > > > > > > about
> > > > > > > > > the
> > > > > > > > > > > GPU?
> > > > > > > > > > > > > > > > > > > > > > -> Can the GPU Manager be a "self
> > > > > contained"
> > > > > > > > > thing
> > > > > > > > > > > that
> > > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > > > > > the configuration, and then
> > abstracts
> > > > > > > everything
> > > > > > > > > > > > > > internally?
> > > > > > > > > > > > > > > > > > > Operators
> > > > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > access it via "GPUManager.get()"
> > or so?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM
> > Yangze
> > > > > Guo <
> > > > > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > > > > > > > > Regarding the WebUI and GPUInfo,
> > > > you're
> > > > > > > right,
> > > > > > > > > > > I'll add
> > > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > > > > > > > > > Regarding the general extended
> > > > resource
> > > > > > > > > > mechanism,
> > > > > > > > > > > I
> > > > > > > > > > > > > > second
> > > > > > > > > > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > > > > > > > > > - It's better to leverage
> > > > > ResourceProfile
> > > > > > > and
> > > > > > > > > > > > > > ResourceSpec
> > > > > > > > > > > > > > > > > after
> > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > supporting fine-grained GPU
> > > > > scheduling. As
> > > > > > > a
> > > > > > > > > > first
> > > > > > > > > > > step
> > > > > > > > > > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > > > > > > > > > prefer to not include it in the
> > scope
> > > > > of
> > > > > > > this
> > > > > > > > > > FLIP.
> > > > > > > > > > > > > > > > > > > > > > > - Regarding the "Extended
> > Resource
> > > > > > > Manager",
> > > > > > > > > if I
> > > > > > > > > > > > > > > understand
> > > > > > > > > > > > > > > > > > > > > > > correctly, it just a code
> > refactoring
> > > > > atm,
> > > > > > > we
> > > > > > > > > > could
> > > > > > > > > > > > > > extract
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > >
> > open/close/allocateExtendResources of
> > > > > > > > > GPUManager
> > > > > > > > > > to
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > > > > > > > > > that is the case, +1 to do it
> > during
> > > > > > > > > > > implementation.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > > > > > > > > > As Xintong said, we looked into
> > how
> > > > > Spark
> > > > > > > > > > supports
> > > > > > > > > > > a
> > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > > > Resource Scheduling" before and
> > > > > decided to
> > > > > > > > > > > introduce a
> > > > > > > > > > > > > > > common
> > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > >
> > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > > > > > > > > > to make it more extensible. I
> > think
> > > > the
> > > > > > > > > > "resource"
> > > > > > > > > > > is a
> > > > > > > > > > > > > > > > proper
> > > > > > > > > > > > > > > > > > > level
> > > > > > > > > > > > > > > > > > > > > > > to contain all the configs of
> > > > extended
> > > > > > > > > resources.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM
> > > > Xingbo
> > > > > > > Huang <
> > > > > > > > > > > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks a lot for the FLIP,
> > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > There is no doubt that GPU
> > resource
> > > > > > > > > management
> > > > > > > > > > > > > support
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > > > > > > > > > facilitate the development of
> > > > > AI-related
> > > > > > > > > > > applications
> > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > PyFlink
> > > > > > > > > > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I have only one comment about
> > this
> > > > > wiki:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Regarding the names of several
> > GPU
> > > > > > > > > > > configurations, I
> > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > delete the resource field
> > makes it
> > > > > > > consistent
> > > > > > > > > > > with
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > names
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > > > resource-related
> > configurations in
> > > > > > > > > > > TaskManagerOption.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > taskmanager.resource.gpu.discovery-script.path
> > > > > > > > > > > > > ->
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Xintong Song <
> > > > tonysong820@gmail.com>
> > > > > > > > > > > 于2020年3月4日周三
> > > > > > > > > > > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Actually, Yangze, Yang and I
> > also
> > > > > had
> > > > > > > an
> > > > > > > > > > > offline
> > > > > > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > > > > > > > > > the "GPU Support" as some
> > general
> > > > > > > "Extended
> > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > Support".
> > > > > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > > > > > > > > > supporting extended
> > resources in
> > > > a
> > > > > > > general
> > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > > > > > and extensible way. The
> > reason we
> > > > > > > propose
> > > > > > > > > > this
> > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > narrowing
> > > > > > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > > > > > > > > down to GPU alone, is mainly
> > for
> > > > > the
> > > > > > > > > concern
> > > > > > > > > > on
> > > > > > > > > > > > > extra
> > > > > > > > > > > > > > > > > efforts
> > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > > > > > > > > > capacity needed for a general
> > > > > > > mechanism.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > To come up with a well
> > design on
> > > > a
> > > > > > > general
> > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > > > > > mechanism, we would need to
> > > > > investigate
> > > > > > > > > more
> > > > > > > > > > > on how
> > > > > > > > > > > > > > > > people
> > > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > > > > > > kind of resources in
> > practice.
> > > > For
> > > > > > > GPU, we
> > > > > > > > > > > learnt
> > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > knowledge
> > > > > > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > experts, Becket and his team
> > > > > members.
> > > > > > > But
> > > > > > > > > for
> > > > > > > > > > > FPGA,
> > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > > > > > > > > extended resources, we don't
> > have
> > > > > such
> > > > > > > > > > > convenient
> > > > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > > > > > > > > > making the investigation
> > requires
> > > > > more
> > > > > > > > > > efforts,
> > > > > > > > > > > > > > which I
> > > > > > > > > > > > > > > > > tend
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On the other hand, we also
> > looked
> > > > > into
> > > > > > > how
> > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > supports a
> > > > > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > > > > > Resource Scheduling".
> > Assuming we
> > > > > want
> > > > > > > to
> > > > > > > > > > have
> > > > > > > > > > > a
> > > > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > > > > resource mechanism in the
> > future,
> > > > > we
> > > > > > > > > believe
> > > > > > > > > > > that
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > current
> > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > > > > > > design can be easily
> > extended, in
> > > > > an
> > > > > > > > > > > incremental
> > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > - The most important part is
> > > > > probably
> > > > > > > user
> > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > > > > > > > > > configuration options to
> > define
> > > > the
> > > > > > > amount,
> > > > > > > > > > > > > discovery
> > > > > > > > > > > > > > > > > script
> > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > > > > > > > > > k8s) in a per resource type
> > bias
> > > > > [1],
> > > > > > > which
> > > > > > > > > > is
> > > > > > > > > > > very
> > > > > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > > > proposed in this FLIP. I
> > think
> > > > > it's not
> > > > > > > > > > > necessary
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > > > > > in the general way atm,
> > since we
> > > > > do not
> > > > > > > > > have
> > > > > > > > > > > > > supports
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > > > types now. If later we
> > decided to
> > > > > have
> > > > > > > per
> > > > > > > > > > > resource
> > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > > > > > > > > > can have backwards
> > compatibility
> > > > > on the
> > > > > > > > > > current
> > > > > > > > > > > > > > > proposed
> > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > > > > > > > > > - For the GPU Manager, if
> > later
> > > > > needed
> > > > > > > we
> > > > > > > > > can
> > > > > > > > > > > > > change
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > > > > > > > > > Resource Manager" (or
> > whatever it
> > > > > is
> > > > > > > > > called).
> > > > > > > > > > > That
> > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > > > > > > > > > component-internal
> > refactoring.
> > > > > > > > > > > > > > > > > > > > > > > > > - For ResourceProfile and
> > > > > ResourceSpec,
> > > > > > > > > there
> > > > > > > > > > > are
> > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > > > > > > > > > general extended resource.
> > We can
> > > > > of
> > > > > > > course
> > > > > > > > > > > > > leverage
> > > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > > > > > > > > > fine grained GPU scheduling.
> > That
> > > > > is
> > > > > > > also
> > > > > > > > > not
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > > > > > > > > step proposal, and would
> > require
> > > > > > > FLIP-56 to
> > > > > > > > > > be
> > > > > > > > > > > > > > finished
> > > > > > > > > > > > > > > > > > first.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > To summary up, I agree with
> > > > Becket
> > > > > that
> > > > > > > > > have
> > > > > > > > > > a
> > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > general extended resource
> > > > > mechanism,
> > > > > > > and
> > > > > > > > > keep
> > > > > > > > > > > it in
> > > > > > > > > > > > > > > mind
> > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > > > > > > > > > and implementing the current
> > one.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18
> > AM
> > > > > Becket
> > > > > > > Qin <
> > > > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > That's a good point,
> > Stephan.
> > > > It
> > > > > > > makes
> > > > > > > > > > total
> > > > > > > > > > > > > sense
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > generalize
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > resource management to
> > support
> > > > > custom
> > > > > > > > > > > resources.
> > > > > > > > > > > > > > > Having
> > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > > > > > > > to add new resources by
> > > > > themselves.
> > > > > > > The
> > > > > > > > > > > general
> > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > > > involve two different
> > aspects:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 1. The custom resource type
> > > > > > > definition.
> > > > > > > > > It
> > > > > > > > > > is
> > > > > > > > > > > > > > > supported
> > > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > > > > > resources in
> > ResourceProfile
> > > > and
> > > > > > > > > > > ResourceSpec.
> > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > 2. The custom resource
> > > > allocation
> > > > > > > logic,
> > > > > > > > > > > i.e. how
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > assign
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > > > > > > > > to different tasks,
> > operators,
> > > > > and
> > > > > > > so on.
> > > > > > > > > > > This
> > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > require
> > > > > > > > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > > > > > > > > > a. Subtask level - make
> > sure
> > > > the
> > > > > > > subtasks
> > > > > > > > > > > are put
> > > > > > > > > > > > > > > into
> > > > > > > > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > > > > > > > > > It is done by the global
> > RM and
> > > > > is
> > > > > > > not
> > > > > > > > > > > > > customizable
> > > > > > > > > > > > > > > > right
> > > > > > > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > > > > > > b. Operator level - map the
> > > > exact
> > > > > > > > > resource
> > > > > > > > > > > to the
> > > > > > > > > > > > > > > > > operators
> > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU
> > 2 for
> > > > > > > operator
> > > > > > > > > B.
> > > > > > > > > > > This
> > > > > > > > > > > > > > step
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > needed
> > > > > > > > > > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > > > > > > > > > the global RM does not
> > > > > distinguish
> > > > > > > > > > individual
> > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > > > > > > > > > It is true for memory, but
> > not
> > > > > for
> > > > > > > GPU.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > The GPU manager is
> > designed to
> > > > > do 2.b
> > > > > > > > > here.
> > > > > > > > > > > So it
> > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > > > > > > > > > physical GPU information
> > and
> > > > > > > bind/match
> > > > > > > > > > them
> > > > > > > > > > > to
> > > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > > > > general will fill in the
> > > > missing
> > > > > > > piece to
> > > > > > > > > > > support
> > > > > > > > > > > > > > > > custom
> > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > > > > > definition. But I'd avoid
> > > > > calling it
> > > > > > > a
> > > > > > > > > > > "External
> > > > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > > > > > > > > confusion with RM, maybe
> > > > > something
> > > > > > > like
> > > > > > > > > > > "Operator
> > > > > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > > > be more accurate. So for
> > each
> > > > > > > resource
> > > > > > > > > type
> > > > > > > > > > > users
> > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > > > > > > > > > "Operator Resource
> > Assigner" in
> > > > > the
> > > > > > > TM.
> > > > > > > > > For
> > > > > > > > > > > > > memory,
> > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > > > > > > > > > but for other extended
> > > > resources,
> > > > > > > users
> > > > > > > > > may
> > > > > > > > > > > need
> > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Personally I think a
> > pluggable
> > > > > > > "Operator
> > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > > > > > > > > > in this FLIP. But I am
> > also OK
> > > > > with
> > > > > > > > > having
> > > > > > > > > > > that
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > > > > > > the interface between the
> > > > > "Operator
> > > > > > > > > > Resource
> > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > > > take a while to settle
> > down if
> > > > we
> > > > > > > want to
> > > > > > > > > > > make it
> > > > > > > > > > > > > > > > > generic.
> > > > > > > > > > > > > > > > > > > But I
> > > > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > > > > > > > > implementation should take
> > this
> > > > > > > future
> > > > > > > > > work
> > > > > > > > > > > into
> > > > > > > > > > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > > > > > > > > > don't need to break
> > backwards
> > > > > > > > > compatibility
> > > > > > > > > > > once
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at
> > 12:27 AM
> > > > > > > Stephan
> > > > > > > > > > Ewen
> > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for writing
> > this
> > > > > FLIP.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > I cannot really give much
> > > > input
> > > > > > > into
> > > > > > > > > the
> > > > > > > > > > > > > > mechanics
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > > > > > > > > > and GPU allocation, as I
> > have
> > > > > no
> > > > > > > > > > experience
> > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > One thought I had when
> > > > reading
> > > > > the
> > > > > > > > > > > proposal is
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > > > > > > > > > the "GPU Manager" as an
> > > > > "External
> > > > > > > > > > Resource
> > > > > > > > > > > > > > > Manager",
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > > > > > > > > > The way I understand the
> > > > > > > > > ResourceProfile
> > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > > > > > > > > > It has the advantage
> > that it
> > > > > looks
> > > > > > > more
> > > > > > > > > > > > > > extensible.
> > > > > > > > > > > > > > > > > Maybe
> > > > > > > > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > > Resource, a specialized
> > > > NVIDIA
> > > > > GPU
> > > > > > > > > > > Resource,
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > FPGA
> > > > > > > > > > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at
> > 7:57
> > > > AM
> > > > > > > Becket
> > > > > > > > > > Qin <
> > > > > > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP
> > Yangze.
> > > > > GPU
> > > > > > > > > > resource
> > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > > > > > > > > > for machine learning
> > use
> > > > > cases.
> > > > > > > > > > Actually
> > > > > > > > > > > it
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > > > > > > > > > question from the
> > users who
> > > > > are
> > > > > > > > > > > interested in
> > > > > > > > > > > > > > > using
> > > > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Some quick comments /
> > > > > questions
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API
> > > > > should
> > > > > > > > > probably
> > > > > > > > > > > also
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > mentioned in
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Is the data
> > structure
> > > > that
> > > > > > > holds
> > > > > > > > > GPU
> > > > > > > > > > > info
> > > > > > > > > > > > > > > also a
> > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at
> > > > 10:15
> > > > > AM
> > > > > > > > > Xintong
> > > > > > > > > > > Song
> > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for drafting
> > the
> > > > > FLIP
> > > > > > > and
> > > > > > > > > > > kicking
> > > > > > > > > > > > > off
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Big +1 for this
> > feature.
> > > > > > > Supporting
> > > > > > > > > > > using
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > especially for the ML
> > > > > > > scenarios.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've reviewed the
> > FLIP
> > > > wiki
> > > > > > > doc and
> > > > > > > > > > it
> > > > > > > > > > > > > looks
> > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > me. I
> > > > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > very good first step
> > for
> > > > > > > Flink's
> > > > > > > > > GPU
> > > > > > > > > > > > > > supports.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020
> > at
> > > > > 12:06 PM
> > > > > > > > > > Yangze
> > > > > > > > > > > Guo
> > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We would like to
> > start
> > > > a
> > > > > > > > > discussion
> > > > > > > > > > > > > thread
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > support in
> > Flink"[1].
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This FLIP mainly
> > > > > discusses
> > > > > > > the
> > > > > > > > > > > following
> > > > > > > > > > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Enable user to
> > > > > configure
> > > > > > > how
> > > > > > > > > many
> > > > > > > > > > > GPUs
> > > > > > > > > > > > > > in a
> > > > > > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > forward such
> > > > > requirements to
> > > > > > > the
> > > > > > > > > > > external
> > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > Kubernetes/Yarn/Mesos
> > > > > > > setups).
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Provide
> > information
> > > > of
> > > > > > > > > available
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Key changes
> > proposed in
> > > > > the
> > > > > > > FLIP
> > > > > > > > > > are
> > > > > > > > > > > as
> > > > > > > > > > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU
> > resource
> > > > > > > > > requirements
> > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce
> > GPUManager
> > > > as
> > > > > > > one of
> > > > > > > > > > the
> > > > > > > > > > > task
> > > > > > > > > > > > > > > > manager
> > > > > > > > > > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and expose GPU
> > resource
> > > > > > > > > information
> > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > context
> > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce the
> > default
> > > > > > > script
> > > > > > > > > for
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > discovery,
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the privilege mode
> > to
> > > > > help
> > > > > > > user
> > > > > > > > > to
> > > > > > > > > > > > > achieve
> > > > > > > > > > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please find more
> > > > details
> > > > > in
> > > > > > > the
> > > > > > > > > > FLIP
> > > > > > > > > > > wiki
> > > > > > > > > > > > > > > > > document
> > > > > > > > > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> >

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Till Rohrmann <tr...@apache.org>.
If there is no need for the ExternalResourceDriver on the RM side, then it
is always a good idea to keep it simple and don't introduce it. One can
always change things once one realizes that there is a need for it.

Cheers,
Till

On Mon, Mar 30, 2020 at 12:00 PM Yangze Guo <ka...@gmail.com> wrote:

> Hi @Till, @Xintong
>
> I think even without the credential concerns, replacing the interfaces
> with configuration options is a good idea from my side.
> - Currently, I don't see any external resource does not compatible
> with this mechanism
> - It reduces the burden of users to implement a plugin themselves.
> WDYT?
>
> Best,
> Yangze Guo
>
> On Mon, Mar 30, 2020 at 5:44 PM Xintong Song <to...@gmail.com>
> wrote:
> >
> > I also agree that the pluggable ExternalResourceDriver should be loaded
> by
> > the cluster class loader. Despite the plugin might be implemented by
> users,
> > external resources (as part of task executor resources) should be cluster
> > configurations, unlike job-level user codes such as UDFs, because the
> task
> > executors belongs to the cluster rather than jobs.
> >
> >
> > IIUC, the concern Stephan raised is about the potential credential
> problem
> > when executing user codes on RM with cluster class loader. The concern
> > makes sense to me, and I think what Yangze suggested should be a good
> > approach trying to prevent such credential problems. The only purpose we
> > tried to execute user codes (i.e. getKubernetes/YarnExternalResource) on
> RM
> > was that, we need to set these key-value pairs to pod/container requests.
> > Replacing the interfaces getKubernetes/YarnExternalResource with
> > configuration options
> > 'external-resource.{resourceName}.yarn/kubernetes.key/amount',
> > we can still fulfill that purpose, without the credential risks.
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Mar 30, 2020 at 5:17 PM Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > > At the moment the RM does not have a user code class loader and I agree
> > > with Stephan that it should stay like this. This, however, does not
> mean
> > > that we cannot support pluggable components in the RM. As long as the
> > > plugins are on the system's class path, it should be fine for the RM to
> > > load them. For example, we could add external resources via Flink's
> plugin
> > > mechanism or something similar.
> > >
> > > A very simple implementation of such an ExternalResourceDriver could
> be a
> > > class which simply returns what is written in the flink-conf.yaml
> under a
> > > given key.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo <ka...@gmail.com> wrote:
> > >
> > > > Hi, Stephan,
> > > >
> > > > I see your concern and I totally agree with you.
> > > >
> > > > The interface on RM side is now `Map<String key, String/Long value>
> > > > getYarn/KubernetesExternalResource()`. The only valid information RM
> > > > get from it is the configuration key of that external resource in
> > > > Yarn/K8s. The "String/Long value" would be the same as the
> > > > external-resource.{resourceName}.amount.
> > > > So, I think it makes sense to replace these two interfaces with two
> > > > configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key.
> We
> > > > may lose some extensibility, but AFAIK it could work with common
> > > > external resources like GPU, FPGA. WDYT?
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen <se...@apache.org>
> wrote:
> > > > >
> > > > > Maybe one final comment: It is probably not an issue, but let's
> try and
> > > > > keep user code (via user code classloader) out of the
> ResourceManager,
> > > if
> > > > > possible.
> > > > >
> > > > > As background:
> > > > >
> > > > > There were thoughts in the past to support setups where the RM
> must run
> > > > > with "superuser" credentials, but we cannot run JM/TM with these
> > > > > credentials, as the user code might access them otherwise.
> > > > > This is actually possible today, you can run the RM in a different
> JVM
> > > or
> > > > > in a different container, and give it more credentials than JMs /
> TMs.
> > > > But
> > > > > for this to be feasible, we cannot allow any user-defined code to
> be in
> > > > the
> > > > > JVM, because that instantaneously breaks the isolation of
> credentials.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo <ka...@gmail.com>
> wrote:
> > > > >
> > > > > > Thanks for the feedback, @Till and @Xintong.
> > > > > >
> > > > > > Regarding separating the interface, I'm also +1 with it.
> > > > > >
> > > > > > Regarding the resource allocation interface, true, it's
> dangerous to
> > > > > > give much access to user codes. Changing the return type to
> > > Map<String
> > > > > > key, String/Long value> makes sense to me. AFAIK, it is
> compatible
> > > > > > with all the first-party supported resources for
> Yarn/Kubernetes. It
> > > > > > could also free us from the potential dependency issue as well.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song <
> tonysong820@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > Thanks for updating the FLIP, Yangze.
> > > > > > >
> > > > > > > I agree with Till that we probably want to separate the
> K8s/Yarn
> > > > > > decorator
> > > > > > > calls. Users can still configure one driver class, and we can
> use
> > > > > > > `instanceof` to check whether the driver implemented K8s/Yarn
> > > > specific
> > > > > > > interfaces.
> > > > > > >
> > > > > > > Moreover, I'm not sure about exposing entire
> `ContainerRequest` /
> > > > `Pod`
> > > > > > > (`AbstractKubernetesStepDecorator` directly manipulates on
> `Pod`)
> > > to
> > > > user
> > > > > > > codes. It gives more access to user codes than needed for
> defining
> > > > > > external
> > > > > > > resource, which might cause problems. Instead, I would suggest
> to
> > > > have
> > > > > > > interface like `Map<String key, String value>
> > > > > > > getYarn/KubernetesExternalResource()` and assemble them into
> > > > > > > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <
> > > trohrmann@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > I'm a bit late to the party. I think the current proposal
> looks
> > > > good.
> > > > > > > >
> > > > > > > > Concerning the ExternalResourceDriver interface defined in
> the
> > > FLIP
> > > > > > [1], I
> > > > > > > > would suggest to not include the decorator calls for
> Kubernetes
> > > and
> > > > > > Yarn in
> > > > > > > > the base interface. Instead I would suggest to segregate the
> > > > deployment
> > > > > > > > specific decorator calls into separate interfaces. That way
> an
> > > > > > > > ExternalResourceDriver does not have to support all
> deployments
> > > > from
> > > > > > the
> > > > > > > > very beginning. Moreover, some resources might not be
> supported
> > > by
> > > > a
> > > > > > > > specific deployment target and the natural way to express
> this
> > > > would
> > > > > > be to
> > > > > > > > not implement the respective deployment specific interface.
> > > > > > > >
> > > > > > > > Moreover, having void
> > > > > > > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> > > > > > containerRequest)
> > > > > > > > in the ExternalResourceDriver interface would require Hadoop
> on
> > > > Flink's
> > > > > > > > classpath whenever the external resource driver is being
> used.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <
> sewen@apache.org>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Nice, thanks a lot!
> > > > > > > > >
> > > > > > > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <
> > > karmagyz@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the suggestion, @Stephan, @Becket and
> @Xintong.
> > > > > > > > > >
> > > > > > > > > > I've updated the FLIP accordingly. I do not add a
> > > > > > > > > > ResourceInfoProvider. Instead, I introduce the
> > > > > > ExternalResourceDriver,
> > > > > > > > > > which takes the responsibility of all relevant
> operations on
> > > > both
> > > > > > RM
> > > > > > > > > > and TM sides.
> > > > > > > > > > After a rethink about decoupling the management of
> external
> > > > > > resources
> > > > > > > > > > from TaskExecutor, I think we could do the same thing on
> the
> > > > > > > > > > ResourceManager side. We do not need to add a specific
> > > > allocation
> > > > > > > > > > logic to the ResourceManager each time we add a specific
> > > > external
> > > > > > > > > > resource.
> > > > > > > > > > - For Yarn, we need the ExternalResourceDriver to edit
> the
> > > > > > > > > > containerRequest.
> > > > > > > > > > - For Kubenetes, ExternalResourceDriver could provide a
> > > > decorator
> > > > > > for
> > > > > > > > > > the TM pod.
> > > > > > > > > >
> > > > > > > > > > In this way, just like MetricReporter, we allow users to
> > > define
> > > > > > their
> > > > > > > > > > custom ExternalResourceDriver. It is more extensible and
> fits
> > > > the
> > > > > > > > > > separation of concerns. For more details, please take a
> look
> > > at
> > > > > > [1].
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <
> > > sewen@apache.org
> > > > >
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > This sounds good to go ahead from my side.
> > > > > > > > > > >
> > > > > > > > > > > I like the approach that Becket suggested - in that
> case
> > > the
> > > > core
> > > > > > > > > > > abstraction that everyone would need to understand
> would be
> > > > > > "external
> > > > > > > > > > > resource allocation" and the "ResourceInfoProvider",
> and
> > > the
> > > > GPU
> > > > > > > > > specific
> > > > > > > > > > > code would be a specific implementation only known to
> that
> > > > > > component
> > > > > > > > > that
> > > > > > > > > > > allocates the external resource. That fits the
> separation
> > > of
> > > > > > concerns
> > > > > > > > > > well.
> > > > > > > > > > >
> > > > > > > > > > > I also understand that it should not be
> over-engineered in
> > > > the
> > > > > > first
> > > > > > > > > > > version, so some simplification makes sense, and then
> > > > gradually
> > > > > > > > expand
> > > > > > > > > > from
> > > > > > > > > > > there.
> > > > > > > > > > >
> > > > > > > > > > > So +1 to go ahead with what was suggested above
> (Xintong /
> > > > > > Becket)
> > > > > > > > from
> > > > > > > > > > my
> > > > > > > > > > > side.
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <
> > > > > > tonysong820@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for the comments, Stephan & Becket.
> > > > > > > > > > > >
> > > > > > > > > > > > @Stephan
> > > > > > > > > > > >
> > > > > > > > > > > > I see your concern, and I completely agree with you
> that
> > > we
> > > > > > should
> > > > > > > > > > first
> > > > > > > > > > > > think about the "library" / "plugin" / "extension"
> style
> > > if
> > > > > > > > possible.
> > > > > > > > > > > >
> > > > > > > > > > > > If GPUs are sliced and assigned during scheduling,
> there
> > > > may be
> > > > > > > > > reason,
> > > > > > > > > > > > > although it looks that it would belong to the slot
> > > then.
> > > > Is
> > > > > > that
> > > > > > > > > > what we
> > > > > > > > > > > > > are doing here?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > In the current proposal, we do not have the GPUs
> sliced
> > > and
> > > > > > > > assigned
> > > > > > > > > to
> > > > > > > > > > > > slots, because it could be problematic without
> dynamic
> > > slot
> > > > > > > > > allocation.
> > > > > > > > > > > > E.g., the number of GPUs might not be evenly
> divisible by
> > > > the
> > > > > > > > number
> > > > > > > > > of
> > > > > > > > > > > > slots.
> > > > > > > > > > > >
> > > > > > > > > > > > I think it makes sense to eventually have the GPUs
> > > > assigned to
> > > > > > > > slots.
> > > > > > > > > > Even
> > > > > > > > > > > > then, we might still need a TM level GPUManager (or
> > > > > > > > ResourceProvider
> > > > > > > > > > like
> > > > > > > > > > > > Becket suggested). For memory, in each slot we can
> simply
> > > > > > request
> > > > > > > > the
> > > > > > > > > > > > amount of memory, leaving it to JVM / OS to decide
> which
> > > > memory
> > > > > > > > > > (address)
> > > > > > > > > > > > should be assigned. For GPU, and potentially other
> > > > resources
> > > > > > like
> > > > > > > > > > FPGA, we
> > > > > > > > > > > > need to explicitly specify which GPU (index) should
> be
> > > > used.
> > > > > > > > > > Therefore, we
> > > > > > > > > > > > need some component at the TM level to coordinate
> which
> > > > slot
> > > > > > uses
> > > > > > > > > which
> > > > > > > > > > > > GPU.
> > > > > > > > > > > >
> > > > > > > > > > > > IMO, unless we say Flink will not support slot-level
> GPU
> > > > > > slicing at
> > > > > > > > > > least
> > > > > > > > > > > > in the foreseeable future, I don't see a good way to
> > > avoid
> > > > > > touching
> > > > > > > > > > the TM
> > > > > > > > > > > > core. To that end, I think Becket's suggestion
> points to
> > > a
> > > > good
> > > > > > > > > > direction,
> > > > > > > > > > > > that supports more features (GPU, FPGA, etc.) with
> less
> > > > > > coupling to
> > > > > > > > > > the TM
> > > > > > > > > > > > core (only needs to understand the general
> interfaces).
> > > The
> > > > > > > > detailed
> > > > > > > > > > > > implementation for specific resource types can even
> be
> > > > > > encapsulated
> > > > > > > > > as
> > > > > > > > > > a
> > > > > > > > > > > > library.
> > > > > > > > > > > >
> > > > > > > > > > > > @Becket
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for sharing your thought on the final state.
> > > > Despite the
> > > > > > > > > > details how
> > > > > > > > > > > > the interfaces should look like, I think this is a
> really
> > > > good
> > > > > > > > > > abstraction
> > > > > > > > > > > > for supporting general resource types.
> > > > > > > > > > > >
> > > > > > > > > > > > I'd like to further clarify that, the following three
> > > > things
> > > > > > are
> > > > > > > > all
> > > > > > > > > > that
> > > > > > > > > > > > the "Flink core" needs to understand.
> > > > > > > > > > > >
> > > > > > > > > > > >    - The *amount* of resource, for scheduling.
> Actually,
> > > we
> > > > > > already
> > > > > > > > > > have
> > > > > > > > > > > >    the Resource class in ResourceProfile and
> ResourceSpec
> > > > for
> > > > > > > > > extended
> > > > > > > > > > > >    resource. It's just not really used.
> > > > > > > > > > > >    - The *info*, that Flink provides to the
> operators /
> > > > user
> > > > > > codes.
> > > > > > > > > > > >    - The *provider*, which generates the info based
> on
> > > the
> > > > > > amount.
> > > > > > > > > > > >
> > > > > > > > > > > > The "core" does not need to understand the specific
> > > > > > implementation
> > > > > > > > > > details
> > > > > > > > > > > > of the above three. They can even be implemented in a
> > > > 3rd-party
> > > > > > > > > > library.
> > > > > > > > > > > > Similar to how we allow users to define their custom
> > > > > > > > MetricReporter.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <
> > > > > > becket.qin@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the comment, Stephan.
> > > > > > > > > > > > >
> > > > > > > > > > > > >   - If everything becomes a "core feature", it will
> > > make
> > > > the
> > > > > > > > > project
> > > > > > > > > > hard
> > > > > > > > > > > > > > to develop in the future. Thinking "library" /
> > > > "plugin" /
> > > > > > > > > > "extension"
> > > > > > > > > > > > > style
> > > > > > > > > > > > > > where possible helps.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Completely agree. It is much more important to
> design a
> > > > > > mechanism
> > > > > > > > > > than
> > > > > > > > > > > > > focusing on a specific case. Here is what I am
> thinking
> > > > to
> > > > > > fully
> > > > > > > > > > support
> > > > > > > > > > > > > custom resource management:
> > > > > > > > > > > > > 1. On the JM / RM side, use ResourceProfile and
> > > > ResourceSpec
> > > > > > to
> > > > > > > > > > define
> > > > > > > > > > > > the
> > > > > > > > > > > > > resource and the amount required. They will be
> used to
> > > > find
> > > > > > > > > suitable
> > > > > > > > > > TMs
> > > > > > > > > > > > > slots to run the tasks. At this point, the
> resources
> > > are
> > > > only
> > > > > > > > > > measured by
> > > > > > > > > > > > > amount, i.e. they do not have individual ID.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. On the TM side, have something like
> > > > > > *"ResourceInfoProvider"*
> > > > > > > > to
> > > > > > > > > > > > identify
> > > > > > > > > > > > > and provides the detail information of the
> individual
> > > > > > resource,
> > > > > > > > > e.g.
> > > > > > > > > > GPU
> > > > > > > > > > > > > ID.. It is important because the operator may have
> to
> > > > > > explicitly
> > > > > > > > > > interact
> > > > > > > > > > > > > with the physical resource it uses. The
> > > > ResourceInfoProvider
> > > > > > > > might
> > > > > > > > > > look
> > > > > > > > > > > > > like something below.
> > > > > > > > > > > > > interface ResourceInfoProvider<INFO> {
> > > > > > > > > > > > >     Map<AbstractID, INFO>
> > > retrieveResourceInfo(OperatorId
> > > > > > opId,
> > > > > > > > > > > > > ResourceProfile resourceProfile);
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > - There could be several "*ResourceInfoProvider*"
> > > > configured
> > > > > > on
> > > > > > > > the
> > > > > > > > > > TM to
> > > > > > > > > > > > > retrieve the information for different resources.
> > > > > > > > > > > > > - The TM will be responsible to assign those
> individual
> > > > > > resources
> > > > > > > > > to
> > > > > > > > > > each
> > > > > > > > > > > > > operator according to their requested amount.
> > > > > > > > > > > > > - The operators will be able to get the
> ResourceInfo
> > > from
> > > > > > their
> > > > > > > > > > > > > RuntimeContext.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If we agree this is a reasonable final state. We
> can
> > > > adapt
> > > > > > the
> > > > > > > > > > current
> > > > > > > > > > > > FLIP
> > > > > > > > > > > > > to it. In fact it does not sound a big change to
> me.
> > > All
> > > > the
> > > > > > > > > proposed
> > > > > > > > > > > > > configuration can be as is, it is just that Flink
> > > itself
> > > > > > won't
> > > > > > > > care
> > > > > > > > > > about
> > > > > > > > > > > > > them, instead a GPUInfoProviver implementing the
> > > > > > > > > ResourceInfoProvider
> > > > > > > > > > > > will
> > > > > > > > > > > > > use them.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <
> > > > > > sewen@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi all!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The main point I wanted to throw into the
> discussion
> > > > is the
> > > > > > > > > > following:
> > > > > > > > > > > > > >   - With more and more use cases, more and more
> tools
> > > > go
> > > > > > into
> > > > > > > > > Flink
> > > > > > > > > > > > > >   - If everything becomes a "core feature", it
> will
> > > > make
> > > > > > the
> > > > > > > > > > project
> > > > > > > > > > > > hard
> > > > > > > > > > > > > > to develop in the future. Thinking "library" /
> > > > "plugin" /
> > > > > > > > > > "extension"
> > > > > > > > > > > > > style
> > > > > > > > > > > > > > where possible helps.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >   - A good thought experiment is always: How many
> > > > future
> > > > > > > > > developers
> > > > > > > > > > > > have
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > interact with this code (and possibly understand
> it
> > > > > > partially),
> > > > > > > > > > even if
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > features they touch have nothing to do with GPU
> > > > support. If
> > > > > > > > many
> > > > > > > > > > > > > > contributors to unrelated features will have to
> touch
> > > > it
> > > > > > and
> > > > > > > > > > understand
> > > > > > > > > > > > > it,
> > > > > > > > > > > > > > then let's think if there is a different
> solution.
> > > > Maybe
> > > > > > there
> > > > > > > > is
> > > > > > > > > > not,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > then we should be sure why.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >   - That led me to raising this issue: If the GPU
> > > > manager
> > > > > > > > > becomes a
> > > > > > > > > > > > core
> > > > > > > > > > > > > > service in the TaskManager, Environment,
> > > > RuntimeContext,
> > > > > > etc.
> > > > > > > > > then
> > > > > > > > > > > > > everyone
> > > > > > > > > > > > > > developing TM and streaming tasks need to
> understand
> > > > the
> > > > > > GPU
> > > > > > > > > > manager.
> > > > > > > > > > > > > That
> > > > > > > > > > > > > > seems oddly specific, is my impression.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Access to configuration seems not the right
> reason to
> > > > do
> > > > > > that.
> > > > > > > > We
> > > > > > > > > > > > should
> > > > > > > > > > > > > > expose the Flink configuration from the
> > > RuntimeContext
> > > > > > anyways.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If GPUs are sliced and assigned during
> scheduling,
> > > > there
> > > > > > may be
> > > > > > > > > > reason,
> > > > > > > > > > > > > > although it looks that it would belong to the
> slot
> > > > then. Is
> > > > > > > > that
> > > > > > > > > > what
> > > > > > > > > > > > we
> > > > > > > > > > > > > > are doing here?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > IMO, eventually an operator should only see
> info of
> > > > GPUs
> > > > > > that
> > > > > > > > > are
> > > > > > > > > > > > > > dedicated
> > > > > > > > > > > > > > > for it, instead of all GPUs on the
> > > machine/container
> > > > in
> > > > > > the
> > > > > > > > > > current
> > > > > > > > > > > > > > design.
> > > > > > > > > > > > > > > It does not make sense to let the user who
> writes a
> > > > UDF
> > > > > > to
> > > > > > > > > worry
> > > > > > > > > > > > about
> > > > > > > > > > > > > > > coordination among multiple operators running
> on
> > > the
> > > > same
> > > > > > > > > > machine.
> > > > > > > > > > > > And
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > > we want to limit the GPU info an operator
> sees, we
> > > > > > should not
> > > > > > > > > > let the
> > > > > > > > > > > > > > > operator to instantiate GPUManager, which
> means we
> > > > have
> > > > > > to
> > > > > > > > > expose
> > > > > > > > > > > > > > something
> > > > > > > > > > > > > > > through runtime context, either GPU info or
> some
> > > > kind of
> > > > > > > > > limited
> > > > > > > > > > > > access
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the GPUManager.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <
> > > > > > > > > becket.qin@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It probably make sense for us to first agree
> on
> > > the
> > > > > > final
> > > > > > > > > > state.
> > > > > > > > > > > > More
> > > > > > > > > > > > > > > > specifically, will the resource info be
> exposed
> > > > through
> > > > > > > > > runtime
> > > > > > > > > > > > > context
> > > > > > > > > > > > > > > > eventually?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If that is the final state and we have a
> seamless
> > > > > > migration
> > > > > > > > > > story
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > FLIP to that final state, Personally I think
> it
> > > is
> > > > OK
> > > > > > to
> > > > > > > > > > expose the
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > info in the runtime context.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong
> Song <
> > > > > > > > > > > > tonysong820@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Yangze,
> > > > > > > > > > > > > > > > > I think what Stephan means (@Stephan,
> please
> > > > correct
> > > > > > me
> > > > > > > > if
> > > > > > > > > > I'm
> > > > > > > > > > > > > wrong)
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > that, we might not need to hold and
> maintain
> > > the
> > > > > > > > GPUManager
> > > > > > > > > > as a
> > > > > > > > > > > > > > > service
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > TaskManagerServices or RuntimeContext. An
> > > > > > alternative is
> > > > > > > > to
> > > > > > > > > > > > create
> > > > > > > > > > > > > /
> > > > > > > > > > > > > > > > > retrieve the GPUManager only in the
> operators
> > > > that
> > > > > > need
> > > > > > > > it,
> > > > > > > > > > e.g.,
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Stephan,
> > > > > > > > > > > > > > > > > I agree with you on excluding GPUManager
> from
> > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >    - For the first step, where we provide
> > > unified
> > > > > > > > TM-level
> > > > > > > > > > GPU
> > > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > >    to all operators, it should be fine to
> have
> > > > > > operators
> > > > > > > > > > access /
> > > > > > > > > > > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > > > > > > > > > > >    - In future, we might have some more
> > > > fine-grained
> > > > > > GPU
> > > > > > > > > > > > > management,
> > > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > >    we need to maintain GPUManager as a
> service
> > > > and
> > > > > > put
> > > > > > > > GPU
> > > > > > > > > > info
> > > > > > > > > > > > in
> > > > > > > > > > > > > > slot
> > > > > > > > > > > > > > > > >    profiles. But at least for now it's not
> > > > necessary
> > > > > > to
> > > > > > > > > > introduce
> > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > >    complexity.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > However, I have some concerns on excluding
> > > > GPUManager
> > > > > > > > from
> > > > > > > > > > > > > > > RuntimeContext
> > > > > > > > > > > > > > > > > and let operators access it directly.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >    - Configurations needed for creating the
> > > > > > GPUManager is
> > > > > > > > > not
> > > > > > > > > > > > > always
> > > > > > > > > > > > > > > > >    available for operators.
> > > > > > > > > > > > > > > > >    - If later we want to have fine-grained
> > > > control
> > > > > > over
> > > > > > > > GPU
> > > > > > > > > > > > (e.g.,
> > > > > > > > > > > > > > > > >    operators in each slot can only see GPUs
> > > > reserved
> > > > > > for
> > > > > > > > > that
> > > > > > > > > > > > > slot),
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I would suggest to wrap the GPUManager
> behind
> > > > > > > > > RuntimeContext
> > > > > > > > > > and
> > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > expose the GPUInfo to users. For now, we
> can
> > > > declare
> > > > > > a
> > > > > > > > > method
> > > > > > > > > > > > > > > > > `getGPUInfo()` in RuntimeContext, with a
> > > default
> > > > > > > > definition
> > > > > > > > > > that
> > > > > > > > > > > > > > calls
> > > > > > > > > > > > > > > > > `GPUManager.get()` to get the
> lazily-created
> > > > > > GPUManager.
> > > > > > > > If
> > > > > > > > > > later
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > want
> > > > > > > > > > > > > > > > > to create / retrieve GPUManager in a
> different
> > > > way,
> > > > > > we
> > > > > > > > can
> > > > > > > > > > simply
> > > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > > how `getGPUInfo` is implemented, without
> > > needing
> > > > to
> > > > > > > > change
> > > > > > > > > > any
> > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze
> Guo <
> > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > @Shephan
> > > > > > > > > > > > > > > > > > Do you mean Minicluster? Yes, it makes
> sense
> > > to
> > > > > > share
> > > > > > > > the
> > > > > > > > > > GPU
> > > > > > > > > > > > > > Manager
> > > > > > > > > > > > > > > > > > in such scenario.
> > > > > > > > > > > > > > > > > > If that's what you worry about, I'm +1
> for
> > > > holding
> > > > > > > > > > > > > > > > > > GPUManager(ExternalResourceManagers) in
> > > > > > TaskExecutor
> > > > > > > > > > instead of
> > > > > > > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Regarding the
> RuntimeContext/FunctionContext,
> > > > it
> > > > > > just
> > > > > > > > > > holds the
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > info instead of the GPU Manager. AFAIK,
> it's
> > > > the
> > > > > > only
> > > > > > > > > > place we
> > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > pass GPU info to the
> > > > > > RichFunction/UserDefinedFunction.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac
> > > Godfried
> > > > <
> > > > > > > > > > > > > > isaac@paddlesoft.net
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000
> > > > > > > > > sewen@apache.org
> > > > > > > > > > > > wrote
> > > > > > > > > > > > > > > ----
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Can we somehow keep this out of the
> > > > > > TaskManager
> > > > > > > > > > services
> > > > > > > > > > > > > > > > > > > > I fear that we could not. IMO, the
> > > > > > GPUManager(or
> > > > > > > > > > > > > > > > > > > > ExternalServicesManagers in future)
> is
> > > > > > conceptually
> > > > > > > > > > one of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > > > manager services, just like
> MemoryManager
> > > > > > before
> > > > > > > > > 1.10.
> > > > > > > > > > > > > > > > > > > > - It maintains/holds the GPU
> resource at
> > > TM
> > > > > > level
> > > > > > > > and
> > > > > > > > > > all
> > > > > > > > > > > > of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > operators allocate the GPU resources
> from
> > > > it.
> > > > > > So,
> > > > > > > > it
> > > > > > > > > > should
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > > > > > > > > > > - We could add a collection called
> > > > > > > > > > ExternalResourceManagers
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > hold
> > > > > > > > > > > > > > > > > > > > all managers of other external
> resources
> > > > in the
> > > > > > > > > future.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Can you help me understand why this
> needs
> > > the
> > > > > > > > addition
> > > > > > > > > in
> > > > > > > > > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > > > > > > > > Are you worried about the case when
> > > multiple
> > > > Task
> > > > > > > > > > Executors
> > > > > > > > > > > > run
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > JVM? That's not common, but wouldn't it
> > > > actually
> > > > > > be
> > > > > > > > > good
> > > > > > > > > > in
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > case
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > share the GPU Manager, given that the
> GPU
> > > is
> > > > > > shared?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > ---------------------------
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > What parts need information about
> this?
> > > > > > > > > > > > > > > > > > > > In this FLIP, operators need the
> > > > information.
> > > > > > Thus,
> > > > > > > > > we
> > > > > > > > > > > > expose
> > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > information to the
> > > > > > RuntimeContext/FunctionContext.
> > > > > > > > > The
> > > > > > > > > > slot
> > > > > > > > > > > > > > > profile
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > not aware of GPU resources as GPU is
> TM
> > > > level
> > > > > > > > > resource
> > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Can the GPU Manager be a "self
> > > contained"
> > > > > > thing
> > > > > > > > > that
> > > > > > > > > > > > simply
> > > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > configuration, and then abstracts
> > > > everything
> > > > > > > > > > internally?
> > > > > > > > > > > > > > > > > > > > Yes, we just pass the path/args of
> the
> > > > discover
> > > > > > > > > script
> > > > > > > > > > and
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > GPUs per TM to it. It takes the
> > > > responsibility
> > > > > > to
> > > > > > > > get
> > > > > > > > > > the
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > information and expose them to the
> > > > > > > > > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > Operators. Meanwhile, we'd better not
> > > allow
> > > > > > > > operators
> > > > > > > > > > to
> > > > > > > > > > > > > > directly
> > > > > > > > > > > > > > > > > > > > access GPUManager, it should get what
> > > they
> > > > want
> > > > > > > > from
> > > > > > > > > > > > Context.
> > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > then decouple the
> > > interface/implementation
> > > > of
> > > > > > > > > > GPUManager
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > Public
> > > > > > > > > > > > > > > > > > > > API.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM
> Stephan
> > > > Ewen <
> > > > > > > > > > > > > sewen@apache.org
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > It sounds fine to initially start
> with
> > > > GPU
> > > > > > > > specific
> > > > > > > > > > > > support
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > generalizing this once we better
> > > > understand
> > > > > > the
> > > > > > > > > > space.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > About the implementation suggested
> in
> > > > > > FLIP-108:
> > > > > > > > > > > > > > > > > > > > > - Can we somehow keep this out of
> the
> > > > > > TaskManager
> > > > > > > > > > > > services?
> > > > > > > > > > > > > > > > > Anything
> > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > have to pull through all layers of
> the
> > > TM
> > > > > > makes
> > > > > > > > the
> > > > > > > > > > TM
> > > > > > > > > > > > > > > components
> > > > > > > > > > > > > > > > > yet
> > > > > > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > - What parts need information about
> > > this?
> > > > > > > > > > > > > > > > > > > > > -> do the slot profiles need
> > > information
> > > > > > about
> > > > > > > > the
> > > > > > > > > > GPU?
> > > > > > > > > > > > > > > > > > > > > -> Can the GPU Manager be a "self
> > > > contained"
> > > > > > > > thing
> > > > > > > > > > that
> > > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > > > > the configuration, and then
> abstracts
> > > > > > everything
> > > > > > > > > > > > > internally?
> > > > > > > > > > > > > > > > > > Operators
> > > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > access it via "GPUManager.get()"
> or so?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM
> Yangze
> > > > Guo <
> > > > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > > > > > > > Regarding the WebUI and GPUInfo,
> > > you're
> > > > > > right,
> > > > > > > > > > I'll add
> > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > > > > > > > > Regarding the general extended
> > > resource
> > > > > > > > > mechanism,
> > > > > > > > > > I
> > > > > > > > > > > > > second
> > > > > > > > > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > > > > > > > > - It's better to leverage
> > > > ResourceProfile
> > > > > > and
> > > > > > > > > > > > > ResourceSpec
> > > > > > > > > > > > > > > > after
> > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > supporting fine-grained GPU
> > > > scheduling. As
> > > > > > a
> > > > > > > > > first
> > > > > > > > > > step
> > > > > > > > > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > > > > > > > > prefer to not include it in the
> scope
> > > > of
> > > > > > this
> > > > > > > > > FLIP.
> > > > > > > > > > > > > > > > > > > > > > - Regarding the "Extended
> Resource
> > > > > > Manager",
> > > > > > > > if I
> > > > > > > > > > > > > > understand
> > > > > > > > > > > > > > > > > > > > > > correctly, it just a code
> refactoring
> > > > atm,
> > > > > > we
> > > > > > > > > could
> > > > > > > > > > > > > extract
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > >
> open/close/allocateExtendResources of
> > > > > > > > GPUManager
> > > > > > > > > to
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > > > > > > > > that is the case, +1 to do it
> during
> > > > > > > > > > implementation.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > > > > > > > > As Xintong said, we looked into
> how
> > > > Spark
> > > > > > > > > supports
> > > > > > > > > > a
> > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > > Resource Scheduling" before and
> > > > decided to
> > > > > > > > > > introduce a
> > > > > > > > > > > > > > common
> > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > >
> schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > > > > > > > > to make it more extensible. I
> think
> > > the
> > > > > > > > > "resource"
> > > > > > > > > > is a
> > > > > > > > > > > > > > > proper
> > > > > > > > > > > > > > > > > > level
> > > > > > > > > > > > > > > > > > > > > > to contain all the configs of
> > > extended
> > > > > > > > resources.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM
> > > Xingbo
> > > > > > Huang <
> > > > > > > > > > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks a lot for the FLIP,
> Yangze.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > There is no doubt that GPU
> resource
> > > > > > > > management
> > > > > > > > > > > > support
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > > > > > > > > facilitate the development of
> > > > AI-related
> > > > > > > > > > applications
> > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > PyFlink
> > > > > > > > > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I have only one comment about
> this
> > > > wiki:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Regarding the names of several
> GPU
> > > > > > > > > > configurations, I
> > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > delete the resource field
> makes it
> > > > > > consistent
> > > > > > > > > > with
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > names
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > > resource-related
> configurations in
> > > > > > > > > > TaskManagerOption.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > taskmanager.resource.gpu.discovery-script.path
> > > > > > > > > > > > ->
> > > > > > > > > > > > > > > > > > > > > > >
> > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Xintong Song <
> > > tonysong820@gmail.com>
> > > > > > > > > > 于2020年3月4日周三
> > > > > > > > > > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Actually, Yangze, Yang and I
> also
> > > > had
> > > > > > an
> > > > > > > > > > offline
> > > > > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > > > > > > > > the "GPU Support" as some
> general
> > > > > > "Extended
> > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > Support".
> > > > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > > > > > > > > supporting extended
> resources in
> > > a
> > > > > > general
> > > > > > > > > > > > mechanism
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > > > > and extensible way. The
> reason we
> > > > > > propose
> > > > > > > > > this
> > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > narrowing
> > > > > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > > > > > > > down to GPU alone, is mainly
> for
> > > > the
> > > > > > > > concern
> > > > > > > > > on
> > > > > > > > > > > > extra
> > > > > > > > > > > > > > > > efforts
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > > > > > > > > capacity needed for a general
> > > > > > mechanism.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > To come up with a well
> design on
> > > a
> > > > > > general
> > > > > > > > > > extended
> > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > > > > mechanism, we would need to
> > > > investigate
> > > > > > > > more
> > > > > > > > > > on how
> > > > > > > > > > > > > > > people
> > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > > > > > kind of resources in
> practice.
> > > For
> > > > > > GPU, we
> > > > > > > > > > learnt
> > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > knowledge
> > > > > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > experts, Becket and his team
> > > > members.
> > > > > > But
> > > > > > > > for
> > > > > > > > > > FPGA,
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > > > > > > > extended resources, we don't
> have
> > > > such
> > > > > > > > > > convenient
> > > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > > > > > > > > making the investigation
> requires
> > > > more
> > > > > > > > > efforts,
> > > > > > > > > > > > > which I
> > > > > > > > > > > > > > > > tend
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On the other hand, we also
> looked
> > > > into
> > > > > > how
> > > > > > > > > > Spark
> > > > > > > > > > > > > > > supports a
> > > > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > > > > Resource Scheduling".
> Assuming we
> > > > want
> > > > > > to
> > > > > > > > > have
> > > > > > > > > > a
> > > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > > > resource mechanism in the
> future,
> > > > we
> > > > > > > > believe
> > > > > > > > > > that
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > current
> > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > > > > > design can be easily
> extended, in
> > > > an
> > > > > > > > > > incremental
> > > > > > > > > > > > way
> > > > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > - The most important part is
> > > > probably
> > > > > > user
> > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > > > > > > > > configuration options to
> define
> > > the
> > > > > > amount,
> > > > > > > > > > > > discovery
> > > > > > > > > > > > > > > > script
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > > > > > > > > k8s) in a per resource type
> bias
> > > > [1],
> > > > > > which
> > > > > > > > > is
> > > > > > > > > > very
> > > > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > > proposed in this FLIP. I
> think
> > > > it's not
> > > > > > > > > > necessary
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > > > > in the general way atm,
> since we
> > > > do not
> > > > > > > > have
> > > > > > > > > > > > supports
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > > types now. If later we
> decided to
> > > > have
> > > > > > per
> > > > > > > > > > resource
> > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > > > > > > > > can have backwards
> compatibility
> > > > on the
> > > > > > > > > current
> > > > > > > > > > > > > > proposed
> > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > > > > > > > > - For the GPU Manager, if
> later
> > > > needed
> > > > > > we
> > > > > > > > can
> > > > > > > > > > > > change
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > > > > > > > > Resource Manager" (or
> whatever it
> > > > is
> > > > > > > > called).
> > > > > > > > > > That
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > > > > > > > > component-internal
> refactoring.
> > > > > > > > > > > > > > > > > > > > > > > > - For ResourceProfile and
> > > > ResourceSpec,
> > > > > > > > there
> > > > > > > > > > are
> > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > > > > > > > > general extended resource.
> We can
> > > > of
> > > > > > course
> > > > > > > > > > > > leverage
> > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > > > > > > > > fine grained GPU scheduling.
> That
> > > > is
> > > > > > also
> > > > > > > > not
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > > > > > > > step proposal, and would
> require
> > > > > > FLIP-56 to
> > > > > > > > > be
> > > > > > > > > > > > > finished
> > > > > > > > > > > > > > > > > first.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > To summary up, I agree with
> > > Becket
> > > > that
> > > > > > > > have
> > > > > > > > > a
> > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > general extended resource
> > > > mechanism,
> > > > > > and
> > > > > > > > keep
> > > > > > > > > > it in
> > > > > > > > > > > > > > mind
> > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > > > > > > > > and implementing the current
> one.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18
> AM
> > > > Becket
> > > > > > Qin <
> > > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > That's a good point,
> Stephan.
> > > It
> > > > > > makes
> > > > > > > > > total
> > > > > > > > > > > > sense
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > generalize
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > resource management to
> support
> > > > custom
> > > > > > > > > > resources.
> > > > > > > > > > > > > > Having
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > > > > > > to add new resources by
> > > > themselves.
> > > > > > The
> > > > > > > > > > general
> > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > > involve two different
> aspects:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 1. The custom resource type
> > > > > > definition.
> > > > > > > > It
> > > > > > > > > is
> > > > > > > > > > > > > > supported
> > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > > > > resources in
> ResourceProfile
> > > and
> > > > > > > > > > ResourceSpec.
> > > > > > > > > > > > This
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > 2. The custom resource
> > > allocation
> > > > > > logic,
> > > > > > > > > > i.e. how
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > assign
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > > > > > > > to different tasks,
> operators,
> > > > and
> > > > > > so on.
> > > > > > > > > > This
> > > > > > > > > > > > may
> > > > > > > > > > > > > > > > require
> > > > > > > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > > > > > > > > a. Subtask level - make
> sure
> > > the
> > > > > > subtasks
> > > > > > > > > > are put
> > > > > > > > > > > > > > into
> > > > > > > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > > > > > > > > It is done by the global
> RM and
> > > > is
> > > > > > not
> > > > > > > > > > > > customizable
> > > > > > > > > > > > > > > right
> > > > > > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > > > > > b. Operator level - map the
> > > exact
> > > > > > > > resource
> > > > > > > > > > to the
> > > > > > > > > > > > > > > > operators
> > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU
> 2 for
> > > > > > operator
> > > > > > > > B.
> > > > > > > > > > This
> > > > > > > > > > > > > step
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > needed
> > > > > > > > > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > > > > > > > > the global RM does not
> > > > distinguish
> > > > > > > > > individual
> > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > > > > > > > > It is true for memory, but
> not
> > > > for
> > > > > > GPU.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > The GPU manager is
> designed to
> > > > do 2.b
> > > > > > > > here.
> > > > > > > > > > So it
> > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > > > > > > > > physical GPU information
> and
> > > > > > bind/match
> > > > > > > > > them
> > > > > > > > > > to
> > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > > > general will fill in the
> > > missing
> > > > > > piece to
> > > > > > > > > > support
> > > > > > > > > > > > > > > custom
> > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > > > > definition. But I'd avoid
> > > > calling it
> > > > > > a
> > > > > > > > > > "External
> > > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > > > > > > > confusion with RM, maybe
> > > > something
> > > > > > like
> > > > > > > > > > "Operator
> > > > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > > be more accurate. So for
> each
> > > > > > resource
> > > > > > > > type
> > > > > > > > > > users
> > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > > > > > > > > "Operator Resource
> Assigner" in
> > > > the
> > > > > > TM.
> > > > > > > > For
> > > > > > > > > > > > memory,
> > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > > > > > > > > but for other extended
> > > resources,
> > > > > > users
> > > > > > > > may
> > > > > > > > > > need
> > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Personally I think a
> pluggable
> > > > > > "Operator
> > > > > > > > > > Resource
> > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > > > > > > > > in this FLIP. But I am
> also OK
> > > > with
> > > > > > > > having
> > > > > > > > > > that
> > > > > > > > > > > > in
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > > > > > the interface between the
> > > > "Operator
> > > > > > > > > Resource
> > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > > take a while to settle
> down if
> > > we
> > > > > > want to
> > > > > > > > > > make it
> > > > > > > > > > > > > > > > generic.
> > > > > > > > > > > > > > > > > > But I
> > > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > > > > > > > implementation should take
> this
> > > > > > future
> > > > > > > > work
> > > > > > > > > > into
> > > > > > > > > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > > > > > > > > don't need to break
> backwards
> > > > > > > > compatibility
> > > > > > > > > > once
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at
> 12:27 AM
> > > > > > Stephan
> > > > > > > > > Ewen
> > > > > > > > > > <
> > > > > > > > > > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for writing
> this
> > > > FLIP.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > I cannot really give much
> > > input
> > > > > > into
> > > > > > > > the
> > > > > > > > > > > > > mechanics
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > > > > > > > > and GPU allocation, as I
> have
> > > > no
> > > > > > > > > experience
> > > > > > > > > > > > with
> > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > One thought I had when
> > > reading
> > > > the
> > > > > > > > > > proposal is
> > > > > > > > > > > > if
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > > > > > > > > the "GPU Manager" as an
> > > > "External
> > > > > > > > > Resource
> > > > > > > > > > > > > > Manager",
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > > > > > > > > The way I understand the
> > > > > > > > ResourceProfile
> > > > > > > > > > and
> > > > > > > > > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > > > > > > > > It has the advantage
> that it
> > > > looks
> > > > > > more
> > > > > > > > > > > > > extensible.
> > > > > > > > > > > > > > > > Maybe
> > > > > > > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > Resource, a specialized
> > > NVIDIA
> > > > GPU
> > > > > > > > > > Resource,
> > > > > > > > > > > > and
> > > > > > > > > > > > > > FPGA
> > > > > > > > > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at
> 7:57
> > > AM
> > > > > > Becket
> > > > > > > > > Qin <
> > > > > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP
> Yangze.
> > > > GPU
> > > > > > > > > resource
> > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > > > > > > > > for machine learning
> use
> > > > cases.
> > > > > > > > > Actually
> > > > > > > > > > it
> > > > > > > > > > > > is
> > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > > > > > > > > question from the
> users who
> > > > are
> > > > > > > > > > interested in
> > > > > > > > > > > > > > using
> > > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Some quick comments /
> > > > questions
> > > > > > to
> > > > > > > > the
> > > > > > > > > > wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API
> > > > should
> > > > > > > > probably
> > > > > > > > > > also
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > mentioned in
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Is the data
> structure
> > > that
> > > > > > holds
> > > > > > > > GPU
> > > > > > > > > > info
> > > > > > > > > > > > > > also a
> > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at
> > > 10:15
> > > > AM
> > > > > > > > Xintong
> > > > > > > > > > Song
> > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for drafting
> the
> > > > FLIP
> > > > > > and
> > > > > > > > > > kicking
> > > > > > > > > > > > off
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Big +1 for this
> feature.
> > > > > > Supporting
> > > > > > > > > > using
> > > > > > > > > > > > of
> > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > especially for the ML
> > > > > > scenarios.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > I've reviewed the
> FLIP
> > > wiki
> > > > > > doc and
> > > > > > > > > it
> > > > > > > > > > > > looks
> > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > me. I
> > > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > very good first step
> for
> > > > > > Flink's
> > > > > > > > GPU
> > > > > > > > > > > > > supports.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020
> at
> > > > 12:06 PM
> > > > > > > > > Yangze
> > > > > > > > > > Guo
> > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > We would like to
> start
> > > a
> > > > > > > > discussion
> > > > > > > > > > > > thread
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > support in
> Flink"[1].
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > This FLIP mainly
> > > > discusses
> > > > > > the
> > > > > > > > > > following
> > > > > > > > > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Enable user to
> > > > configure
> > > > > > how
> > > > > > > > many
> > > > > > > > > > GPUs
> > > > > > > > > > > > > in a
> > > > > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > forward such
> > > > requirements to
> > > > > > the
> > > > > > > > > > external
> > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> Kubernetes/Yarn/Mesos
> > > > > > setups).
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Provide
> information
> > > of
> > > > > > > > available
> > > > > > > > > > GPU
> > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Key changes
> proposed in
> > > > the
> > > > > > FLIP
> > > > > > > > > are
> > > > > > > > > > as
> > > > > > > > > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU
> resource
> > > > > > > > requirements
> > > > > > > > > > to
> > > > > > > > > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce
> GPUManager
> > > as
> > > > > > one of
> > > > > > > > > the
> > > > > > > > > > task
> > > > > > > > > > > > > > > manager
> > > > > > > > > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > and expose GPU
> resource
> > > > > > > > information
> > > > > > > > > > to
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > context
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce the
> default
> > > > > > script
> > > > > > > > for
> > > > > > > > > > GPU
> > > > > > > > > > > > > > > discovery,
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > the privilege mode
> to
> > > > help
> > > > > > user
> > > > > > > > to
> > > > > > > > > > > > achieve
> > > > > > > > > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please find more
> > > details
> > > > in
> > > > > > the
> > > > > > > > > FLIP
> > > > > > > > > > wiki
> > > > > > > > > > > > > > > > document
> > > > > > > > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Yangze Guo <ka...@gmail.com>.
Hi @Till, @Xintong

I think even without the credential concerns, replacing the interfaces
with configuration options is a good idea from my side.
- Currently, I don't see any external resource does not compatible
with this mechanism
- It reduces the burden of users to implement a plugin themselves.
WDYT?

Best,
Yangze Guo

On Mon, Mar 30, 2020 at 5:44 PM Xintong Song <to...@gmail.com> wrote:
>
> I also agree that the pluggable ExternalResourceDriver should be loaded by
> the cluster class loader. Despite the plugin might be implemented by users,
> external resources (as part of task executor resources) should be cluster
> configurations, unlike job-level user codes such as UDFs, because the task
> executors belongs to the cluster rather than jobs.
>
>
> IIUC, the concern Stephan raised is about the potential credential problem
> when executing user codes on RM with cluster class loader. The concern
> makes sense to me, and I think what Yangze suggested should be a good
> approach trying to prevent such credential problems. The only purpose we
> tried to execute user codes (i.e. getKubernetes/YarnExternalResource) on RM
> was that, we need to set these key-value pairs to pod/container requests.
> Replacing the interfaces getKubernetes/YarnExternalResource with
> configuration options
> 'external-resource.{resourceName}.yarn/kubernetes.key/amount',
> we can still fulfill that purpose, without the credential risks.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Mar 30, 2020 at 5:17 PM Till Rohrmann <tr...@apache.org> wrote:
>
> > At the moment the RM does not have a user code class loader and I agree
> > with Stephan that it should stay like this. This, however, does not mean
> > that we cannot support pluggable components in the RM. As long as the
> > plugins are on the system's class path, it should be fine for the RM to
> > load them. For example, we could add external resources via Flink's plugin
> > mechanism or something similar.
> >
> > A very simple implementation of such an ExternalResourceDriver could be a
> > class which simply returns what is written in the flink-conf.yaml under a
> > given key.
> >
> > Cheers,
> > Till
> >
> > On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Hi, Stephan,
> > >
> > > I see your concern and I totally agree with you.
> > >
> > > The interface on RM side is now `Map<String key, String/Long value>
> > > getYarn/KubernetesExternalResource()`. The only valid information RM
> > > get from it is the configuration key of that external resource in
> > > Yarn/K8s. The "String/Long value" would be the same as the
> > > external-resource.{resourceName}.amount.
> > > So, I think it makes sense to replace these two interfaces with two
> > > configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key. We
> > > may lose some extensibility, but AFAIK it could work with common
> > > external resources like GPU, FPGA. WDYT?
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen <se...@apache.org> wrote:
> > > >
> > > > Maybe one final comment: It is probably not an issue, but let's try and
> > > > keep user code (via user code classloader) out of the ResourceManager,
> > if
> > > > possible.
> > > >
> > > > As background:
> > > >
> > > > There were thoughts in the past to support setups where the RM must run
> > > > with "superuser" credentials, but we cannot run JM/TM with these
> > > > credentials, as the user code might access them otherwise.
> > > > This is actually possible today, you can run the RM in a different JVM
> > or
> > > > in a different container, and give it more credentials than JMs / TMs.
> > > But
> > > > for this to be feasible, we cannot allow any user-defined code to be in
> > > the
> > > > JVM, because that instantaneously breaks the isolation of credentials.
> > > >
> > > >
> > > >
> > > > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo <ka...@gmail.com> wrote:
> > > >
> > > > > Thanks for the feedback, @Till and @Xintong.
> > > > >
> > > > > Regarding separating the interface, I'm also +1 with it.
> > > > >
> > > > > Regarding the resource allocation interface, true, it's dangerous to
> > > > > give much access to user codes. Changing the return type to
> > Map<String
> > > > > key, String/Long value> makes sense to me. AFAIK, it is compatible
> > > > > with all the first-party supported resources for Yarn/Kubernetes. It
> > > > > could also free us from the potential dependency issue as well.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song <tonysong820@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > Thanks for updating the FLIP, Yangze.
> > > > > >
> > > > > > I agree with Till that we probably want to separate the K8s/Yarn
> > > > > decorator
> > > > > > calls. Users can still configure one driver class, and we can use
> > > > > > `instanceof` to check whether the driver implemented K8s/Yarn
> > > specific
> > > > > > interfaces.
> > > > > >
> > > > > > Moreover, I'm not sure about exposing entire `ContainerRequest` /
> > > `Pod`
> > > > > > (`AbstractKubernetesStepDecorator` directly manipulates on `Pod`)
> > to
> > > user
> > > > > > codes. It gives more access to user codes than needed for defining
> > > > > external
> > > > > > resource, which might cause problems. Instead, I would suggest to
> > > have
> > > > > > interface like `Map<String key, String value>
> > > > > > getYarn/KubernetesExternalResource()` and assemble them into
> > > > > > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <
> > trohrmann@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > I'm a bit late to the party. I think the current proposal looks
> > > good.
> > > > > > >
> > > > > > > Concerning the ExternalResourceDriver interface defined in the
> > FLIP
> > > > > [1], I
> > > > > > > would suggest to not include the decorator calls for Kubernetes
> > and
> > > > > Yarn in
> > > > > > > the base interface. Instead I would suggest to segregate the
> > > deployment
> > > > > > > specific decorator calls into separate interfaces. That way an
> > > > > > > ExternalResourceDriver does not have to support all deployments
> > > from
> > > > > the
> > > > > > > very beginning. Moreover, some resources might not be supported
> > by
> > > a
> > > > > > > specific deployment target and the natural way to express this
> > > would
> > > > > be to
> > > > > > > not implement the respective deployment specific interface.
> > > > > > >
> > > > > > > Moreover, having void
> > > > > > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> > > > > containerRequest)
> > > > > > > in the ExternalResourceDriver interface would require Hadoop on
> > > Flink's
> > > > > > > classpath whenever the external resource driver is being used.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <se...@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > Nice, thanks a lot!
> > > > > > > >
> > > > > > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <
> > karmagyz@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the suggestion, @Stephan, @Becket and @Xintong.
> > > > > > > > >
> > > > > > > > > I've updated the FLIP accordingly. I do not add a
> > > > > > > > > ResourceInfoProvider. Instead, I introduce the
> > > > > ExternalResourceDriver,
> > > > > > > > > which takes the responsibility of all relevant operations on
> > > both
> > > > > RM
> > > > > > > > > and TM sides.
> > > > > > > > > After a rethink about decoupling the management of external
> > > > > resources
> > > > > > > > > from TaskExecutor, I think we could do the same thing on the
> > > > > > > > > ResourceManager side. We do not need to add a specific
> > > allocation
> > > > > > > > > logic to the ResourceManager each time we add a specific
> > > external
> > > > > > > > > resource.
> > > > > > > > > - For Yarn, we need the ExternalResourceDriver to edit the
> > > > > > > > > containerRequest.
> > > > > > > > > - For Kubenetes, ExternalResourceDriver could provide a
> > > decorator
> > > > > for
> > > > > > > > > the TM pod.
> > > > > > > > >
> > > > > > > > > In this way, just like MetricReporter, we allow users to
> > define
> > > > > their
> > > > > > > > > custom ExternalResourceDriver. It is more extensible and fits
> > > the
> > > > > > > > > separation of concerns. For more details, please take a look
> > at
> > > > > [1].
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yangze Guo
> > > > > > > > >
> > > > > > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <
> > sewen@apache.org
> > > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > This sounds good to go ahead from my side.
> > > > > > > > > >
> > > > > > > > > > I like the approach that Becket suggested - in that case
> > the
> > > core
> > > > > > > > > > abstraction that everyone would need to understand would be
> > > > > "external
> > > > > > > > > > resource allocation" and the "ResourceInfoProvider", and
> > the
> > > GPU
> > > > > > > > specific
> > > > > > > > > > code would be a specific implementation only known to that
> > > > > component
> > > > > > > > that
> > > > > > > > > > allocates the external resource. That fits the separation
> > of
> > > > > concerns
> > > > > > > > > well.
> > > > > > > > > >
> > > > > > > > > > I also understand that it should not be over-engineered in
> > > the
> > > > > first
> > > > > > > > > > version, so some simplification makes sense, and then
> > > gradually
> > > > > > > expand
> > > > > > > > > from
> > > > > > > > > > there.
> > > > > > > > > >
> > > > > > > > > > So +1 to go ahead with what was suggested above (Xintong /
> > > > > Becket)
> > > > > > > from
> > > > > > > > > my
> > > > > > > > > > side.
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <
> > > > > tonysong820@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the comments, Stephan & Becket.
> > > > > > > > > > >
> > > > > > > > > > > @Stephan
> > > > > > > > > > >
> > > > > > > > > > > I see your concern, and I completely agree with you that
> > we
> > > > > should
> > > > > > > > > first
> > > > > > > > > > > think about the "library" / "plugin" / "extension" style
> > if
> > > > > > > possible.
> > > > > > > > > > >
> > > > > > > > > > > If GPUs are sliced and assigned during scheduling, there
> > > may be
> > > > > > > > reason,
> > > > > > > > > > > > although it looks that it would belong to the slot
> > then.
> > > Is
> > > > > that
> > > > > > > > > what we
> > > > > > > > > > > > are doing here?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > In the current proposal, we do not have the GPUs sliced
> > and
> > > > > > > assigned
> > > > > > > > to
> > > > > > > > > > > slots, because it could be problematic without dynamic
> > slot
> > > > > > > > allocation.
> > > > > > > > > > > E.g., the number of GPUs might not be evenly divisible by
> > > the
> > > > > > > number
> > > > > > > > of
> > > > > > > > > > > slots.
> > > > > > > > > > >
> > > > > > > > > > > I think it makes sense to eventually have the GPUs
> > > assigned to
> > > > > > > slots.
> > > > > > > > > Even
> > > > > > > > > > > then, we might still need a TM level GPUManager (or
> > > > > > > ResourceProvider
> > > > > > > > > like
> > > > > > > > > > > Becket suggested). For memory, in each slot we can simply
> > > > > request
> > > > > > > the
> > > > > > > > > > > amount of memory, leaving it to JVM / OS to decide which
> > > memory
> > > > > > > > > (address)
> > > > > > > > > > > should be assigned. For GPU, and potentially other
> > > resources
> > > > > like
> > > > > > > > > FPGA, we
> > > > > > > > > > > need to explicitly specify which GPU (index) should be
> > > used.
> > > > > > > > > Therefore, we
> > > > > > > > > > > need some component at the TM level to coordinate which
> > > slot
> > > > > uses
> > > > > > > > which
> > > > > > > > > > > GPU.
> > > > > > > > > > >
> > > > > > > > > > > IMO, unless we say Flink will not support slot-level GPU
> > > > > slicing at
> > > > > > > > > least
> > > > > > > > > > > in the foreseeable future, I don't see a good way to
> > avoid
> > > > > touching
> > > > > > > > > the TM
> > > > > > > > > > > core. To that end, I think Becket's suggestion points to
> > a
> > > good
> > > > > > > > > direction,
> > > > > > > > > > > that supports more features (GPU, FPGA, etc.) with less
> > > > > coupling to
> > > > > > > > > the TM
> > > > > > > > > > > core (only needs to understand the general interfaces).
> > The
> > > > > > > detailed
> > > > > > > > > > > implementation for specific resource types can even be
> > > > > encapsulated
> > > > > > > > as
> > > > > > > > > a
> > > > > > > > > > > library.
> > > > > > > > > > >
> > > > > > > > > > > @Becket
> > > > > > > > > > >
> > > > > > > > > > > Thanks for sharing your thought on the final state.
> > > Despite the
> > > > > > > > > details how
> > > > > > > > > > > the interfaces should look like, I think this is a really
> > > good
> > > > > > > > > abstraction
> > > > > > > > > > > for supporting general resource types.
> > > > > > > > > > >
> > > > > > > > > > > I'd like to further clarify that, the following three
> > > things
> > > > > are
> > > > > > > all
> > > > > > > > > that
> > > > > > > > > > > the "Flink core" needs to understand.
> > > > > > > > > > >
> > > > > > > > > > >    - The *amount* of resource, for scheduling. Actually,
> > we
> > > > > already
> > > > > > > > > have
> > > > > > > > > > >    the Resource class in ResourceProfile and ResourceSpec
> > > for
> > > > > > > > extended
> > > > > > > > > > >    resource. It's just not really used.
> > > > > > > > > > >    - The *info*, that Flink provides to the operators /
> > > user
> > > > > codes.
> > > > > > > > > > >    - The *provider*, which generates the info based on
> > the
> > > > > amount.
> > > > > > > > > > >
> > > > > > > > > > > The "core" does not need to understand the specific
> > > > > implementation
> > > > > > > > > details
> > > > > > > > > > > of the above three. They can even be implemented in a
> > > 3rd-party
> > > > > > > > > library.
> > > > > > > > > > > Similar to how we allow users to define their custom
> > > > > > > MetricReporter.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <
> > > > > becket.qin@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for the comment, Stephan.
> > > > > > > > > > > >
> > > > > > > > > > > >   - If everything becomes a "core feature", it will
> > make
> > > the
> > > > > > > > project
> > > > > > > > > hard
> > > > > > > > > > > > > to develop in the future. Thinking "library" /
> > > "plugin" /
> > > > > > > > > "extension"
> > > > > > > > > > > > style
> > > > > > > > > > > > > where possible helps.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Completely agree. It is much more important to design a
> > > > > mechanism
> > > > > > > > > than
> > > > > > > > > > > > focusing on a specific case. Here is what I am thinking
> > > to
> > > > > fully
> > > > > > > > > support
> > > > > > > > > > > > custom resource management:
> > > > > > > > > > > > 1. On the JM / RM side, use ResourceProfile and
> > > ResourceSpec
> > > > > to
> > > > > > > > > define
> > > > > > > > > > > the
> > > > > > > > > > > > resource and the amount required. They will be used to
> > > find
> > > > > > > > suitable
> > > > > > > > > TMs
> > > > > > > > > > > > slots to run the tasks. At this point, the resources
> > are
> > > only
> > > > > > > > > measured by
> > > > > > > > > > > > amount, i.e. they do not have individual ID.
> > > > > > > > > > > >
> > > > > > > > > > > > 2. On the TM side, have something like
> > > > > *"ResourceInfoProvider"*
> > > > > > > to
> > > > > > > > > > > identify
> > > > > > > > > > > > and provides the detail information of the individual
> > > > > resource,
> > > > > > > > e.g.
> > > > > > > > > GPU
> > > > > > > > > > > > ID.. It is important because the operator may have to
> > > > > explicitly
> > > > > > > > > interact
> > > > > > > > > > > > with the physical resource it uses. The
> > > ResourceInfoProvider
> > > > > > > might
> > > > > > > > > look
> > > > > > > > > > > > like something below.
> > > > > > > > > > > > interface ResourceInfoProvider<INFO> {
> > > > > > > > > > > >     Map<AbstractID, INFO>
> > retrieveResourceInfo(OperatorId
> > > > > opId,
> > > > > > > > > > > > ResourceProfile resourceProfile);
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > - There could be several "*ResourceInfoProvider*"
> > > configured
> > > > > on
> > > > > > > the
> > > > > > > > > TM to
> > > > > > > > > > > > retrieve the information for different resources.
> > > > > > > > > > > > - The TM will be responsible to assign those individual
> > > > > resources
> > > > > > > > to
> > > > > > > > > each
> > > > > > > > > > > > operator according to their requested amount.
> > > > > > > > > > > > - The operators will be able to get the ResourceInfo
> > from
> > > > > their
> > > > > > > > > > > > RuntimeContext.
> > > > > > > > > > > >
> > > > > > > > > > > > If we agree this is a reasonable final state. We can
> > > adapt
> > > > > the
> > > > > > > > > current
> > > > > > > > > > > FLIP
> > > > > > > > > > > > to it. In fact it does not sound a big change to me.
> > All
> > > the
> > > > > > > > proposed
> > > > > > > > > > > > configuration can be as is, it is just that Flink
> > itself
> > > > > won't
> > > > > > > care
> > > > > > > > > about
> > > > > > > > > > > > them, instead a GPUInfoProviver implementing the
> > > > > > > > ResourceInfoProvider
> > > > > > > > > > > will
> > > > > > > > > > > > use them.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <
> > > > > sewen@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi all!
> > > > > > > > > > > > >
> > > > > > > > > > > > > The main point I wanted to throw into the discussion
> > > is the
> > > > > > > > > following:
> > > > > > > > > > > > >   - With more and more use cases, more and more tools
> > > go
> > > > > into
> > > > > > > > Flink
> > > > > > > > > > > > >   - If everything becomes a "core feature", it will
> > > make
> > > > > the
> > > > > > > > > project
> > > > > > > > > > > hard
> > > > > > > > > > > > > to develop in the future. Thinking "library" /
> > > "plugin" /
> > > > > > > > > "extension"
> > > > > > > > > > > > style
> > > > > > > > > > > > > where possible helps.
> > > > > > > > > > > > >
> > > > > > > > > > > > >   - A good thought experiment is always: How many
> > > future
> > > > > > > > developers
> > > > > > > > > > > have
> > > > > > > > > > > > to
> > > > > > > > > > > > > interact with this code (and possibly understand it
> > > > > partially),
> > > > > > > > > even if
> > > > > > > > > > > > the
> > > > > > > > > > > > > features they touch have nothing to do with GPU
> > > support. If
> > > > > > > many
> > > > > > > > > > > > > contributors to unrelated features will have to touch
> > > it
> > > > > and
> > > > > > > > > understand
> > > > > > > > > > > > it,
> > > > > > > > > > > > > then let's think if there is a different solution.
> > > Maybe
> > > > > there
> > > > > > > is
> > > > > > > > > not,
> > > > > > > > > > > > but
> > > > > > > > > > > > > then we should be sure why.
> > > > > > > > > > > > >
> > > > > > > > > > > > >   - That led me to raising this issue: If the GPU
> > > manager
> > > > > > > > becomes a
> > > > > > > > > > > core
> > > > > > > > > > > > > service in the TaskManager, Environment,
> > > RuntimeContext,
> > > > > etc.
> > > > > > > > then
> > > > > > > > > > > > everyone
> > > > > > > > > > > > > developing TM and streaming tasks need to understand
> > > the
> > > > > GPU
> > > > > > > > > manager.
> > > > > > > > > > > > That
> > > > > > > > > > > > > seems oddly specific, is my impression.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Access to configuration seems not the right reason to
> > > do
> > > > > that.
> > > > > > > We
> > > > > > > > > > > should
> > > > > > > > > > > > > expose the Flink configuration from the
> > RuntimeContext
> > > > > anyways.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If GPUs are sliced and assigned during scheduling,
> > > there
> > > > > may be
> > > > > > > > > reason,
> > > > > > > > > > > > > although it looks that it would belong to the slot
> > > then. Is
> > > > > > > that
> > > > > > > > > what
> > > > > > > > > > > we
> > > > > > > > > > > > > are doing here?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Stephan
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > IMO, eventually an operator should only see info of
> > > GPUs
> > > > > that
> > > > > > > > are
> > > > > > > > > > > > > dedicated
> > > > > > > > > > > > > > for it, instead of all GPUs on the
> > machine/container
> > > in
> > > > > the
> > > > > > > > > current
> > > > > > > > > > > > > design.
> > > > > > > > > > > > > > It does not make sense to let the user who writes a
> > > UDF
> > > > > to
> > > > > > > > worry
> > > > > > > > > > > about
> > > > > > > > > > > > > > coordination among multiple operators running on
> > the
> > > same
> > > > > > > > > machine.
> > > > > > > > > > > And
> > > > > > > > > > > > if
> > > > > > > > > > > > > > we want to limit the GPU info an operator sees, we
> > > > > should not
> > > > > > > > > let the
> > > > > > > > > > > > > > operator to instantiate GPUManager, which means we
> > > have
> > > > > to
> > > > > > > > expose
> > > > > > > > > > > > > something
> > > > > > > > > > > > > > through runtime context, either GPU info or some
> > > kind of
> > > > > > > > limited
> > > > > > > > > > > access
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > the GPUManager.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <
> > > > > > > > becket.qin@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It probably make sense for us to first agree on
> > the
> > > > > final
> > > > > > > > > state.
> > > > > > > > > > > More
> > > > > > > > > > > > > > > specifically, will the resource info be exposed
> > > through
> > > > > > > > runtime
> > > > > > > > > > > > context
> > > > > > > > > > > > > > > eventually?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If that is the final state and we have a seamless
> > > > > migration
> > > > > > > > > story
> > > > > > > > > > > > from
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > FLIP to that final state, Personally I think it
> > is
> > > OK
> > > > > to
> > > > > > > > > expose the
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > info in the runtime context.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> > > > > > > > > > > tonysong820@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Yangze,
> > > > > > > > > > > > > > > > I think what Stephan means (@Stephan, please
> > > correct
> > > > > me
> > > > > > > if
> > > > > > > > > I'm
> > > > > > > > > > > > wrong)
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > that, we might not need to hold and maintain
> > the
> > > > > > > GPUManager
> > > > > > > > > as a
> > > > > > > > > > > > > > service
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > TaskManagerServices or RuntimeContext. An
> > > > > alternative is
> > > > > > > to
> > > > > > > > > > > create
> > > > > > > > > > > > /
> > > > > > > > > > > > > > > > retrieve the GPUManager only in the operators
> > > that
> > > > > need
> > > > > > > it,
> > > > > > > > > e.g.,
> > > > > > > > > > > > > with
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Stephan,
> > > > > > > > > > > > > > > > I agree with you on excluding GPUManager from
> > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >    - For the first step, where we provide
> > unified
> > > > > > > TM-level
> > > > > > > > > GPU
> > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > >    to all operators, it should be fine to have
> > > > > operators
> > > > > > > > > access /
> > > > > > > > > > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > > > > > > > > > >    - In future, we might have some more
> > > fine-grained
> > > > > GPU
> > > > > > > > > > > > management,
> > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > >    we need to maintain GPUManager as a service
> > > and
> > > > > put
> > > > > > > GPU
> > > > > > > > > info
> > > > > > > > > > > in
> > > > > > > > > > > > > slot
> > > > > > > > > > > > > > > >    profiles. But at least for now it's not
> > > necessary
> > > > > to
> > > > > > > > > introduce
> > > > > > > > > > > > > such
> > > > > > > > > > > > > > > >    complexity.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > However, I have some concerns on excluding
> > > GPUManager
> > > > > > > from
> > > > > > > > > > > > > > RuntimeContext
> > > > > > > > > > > > > > > > and let operators access it directly.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >    - Configurations needed for creating the
> > > > > GPUManager is
> > > > > > > > not
> > > > > > > > > > > > always
> > > > > > > > > > > > > > > >    available for operators.
> > > > > > > > > > > > > > > >    - If later we want to have fine-grained
> > > control
> > > > > over
> > > > > > > GPU
> > > > > > > > > > > (e.g.,
> > > > > > > > > > > > > > > >    operators in each slot can only see GPUs
> > > reserved
> > > > > for
> > > > > > > > that
> > > > > > > > > > > > slot),
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I would suggest to wrap the GPUManager behind
> > > > > > > > RuntimeContext
> > > > > > > > > and
> > > > > > > > > > > > only
> > > > > > > > > > > > > > > > expose the GPUInfo to users. For now, we can
> > > declare
> > > > > a
> > > > > > > > method
> > > > > > > > > > > > > > > > `getGPUInfo()` in RuntimeContext, with a
> > default
> > > > > > > definition
> > > > > > > > > that
> > > > > > > > > > > > > calls
> > > > > > > > > > > > > > > > `GPUManager.get()` to get the lazily-created
> > > > > GPUManager.
> > > > > > > If
> > > > > > > > > later
> > > > > > > > > > > > we
> > > > > > > > > > > > > > want
> > > > > > > > > > > > > > > > to create / retrieve GPUManager in a different
> > > way,
> > > > > we
> > > > > > > can
> > > > > > > > > simply
> > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > how `getGPUInfo` is implemented, without
> > needing
> > > to
> > > > > > > change
> > > > > > > > > any
> > > > > > > > > > > > public
> > > > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <
> > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Shephan
> > > > > > > > > > > > > > > > > Do you mean Minicluster? Yes, it makes sense
> > to
> > > > > share
> > > > > > > the
> > > > > > > > > GPU
> > > > > > > > > > > > > Manager
> > > > > > > > > > > > > > > > > in such scenario.
> > > > > > > > > > > > > > > > > If that's what you worry about, I'm +1 for
> > > holding
> > > > > > > > > > > > > > > > > GPUManager(ExternalResourceManagers) in
> > > > > TaskExecutor
> > > > > > > > > instead of
> > > > > > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Regarding the RuntimeContext/FunctionContext,
> > > it
> > > > > just
> > > > > > > > > holds the
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > info instead of the GPU Manager. AFAIK, it's
> > > the
> > > > > only
> > > > > > > > > place we
> > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > pass GPU info to the
> > > > > RichFunction/UserDefinedFunction.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac
> > Godfried
> > > <
> > > > > > > > > > > > > isaac@paddlesoft.net
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000
> > > > > > > > sewen@apache.org
> > > > > > > > > > > wrote
> > > > > > > > > > > > > > ----
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Can we somehow keep this out of the
> > > > > TaskManager
> > > > > > > > > services
> > > > > > > > > > > > > > > > > > > I fear that we could not. IMO, the
> > > > > GPUManager(or
> > > > > > > > > > > > > > > > > > > ExternalServicesManagers in future) is
> > > > > conceptually
> > > > > > > > > one of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > > manager services, just like MemoryManager
> > > > > before
> > > > > > > > 1.10.
> > > > > > > > > > > > > > > > > > > - It maintains/holds the GPU resource at
> > TM
> > > > > level
> > > > > > > and
> > > > > > > > > all
> > > > > > > > > > > of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > operators allocate the GPU resources from
> > > it.
> > > > > So,
> > > > > > > it
> > > > > > > > > should
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > > > > > > > > > - We could add a collection called
> > > > > > > > > ExternalResourceManagers
> > > > > > > > > > > > to
> > > > > > > > > > > > > > hold
> > > > > > > > > > > > > > > > > > > all managers of other external resources
> > > in the
> > > > > > > > future.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can you help me understand why this needs
> > the
> > > > > > > addition
> > > > > > > > in
> > > > > > > > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > > > > > > > Are you worried about the case when
> > multiple
> > > Task
> > > > > > > > > Executors
> > > > > > > > > > > run
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > JVM? That's not common, but wouldn't it
> > > actually
> > > > > be
> > > > > > > > good
> > > > > > > > > in
> > > > > > > > > > > > that
> > > > > > > > > > > > > > case
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > share the GPU Manager, given that the GPU
> > is
> > > > > shared?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > ---------------------------
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > What parts need information about this?
> > > > > > > > > > > > > > > > > > > In this FLIP, operators need the
> > > information.
> > > > > Thus,
> > > > > > > > we
> > > > > > > > > > > expose
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > information to the
> > > > > RuntimeContext/FunctionContext.
> > > > > > > > The
> > > > > > > > > slot
> > > > > > > > > > > > > > profile
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > not aware of GPU resources as GPU is TM
> > > level
> > > > > > > > resource
> > > > > > > > > now.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Can the GPU Manager be a "self
> > contained"
> > > > > thing
> > > > > > > > that
> > > > > > > > > > > simply
> > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > configuration, and then abstracts
> > > everything
> > > > > > > > > internally?
> > > > > > > > > > > > > > > > > > > Yes, we just pass the path/args of the
> > > discover
> > > > > > > > script
> > > > > > > > > and
> > > > > > > > > > > > how
> > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > GPUs per TM to it. It takes the
> > > responsibility
> > > > > to
> > > > > > > get
> > > > > > > > > the
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > information and expose them to the
> > > > > > > > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > Operators. Meanwhile, we'd better not
> > allow
> > > > > > > operators
> > > > > > > > > to
> > > > > > > > > > > > > directly
> > > > > > > > > > > > > > > > > > > access GPUManager, it should get what
> > they
> > > want
> > > > > > > from
> > > > > > > > > > > Context.
> > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > then decouple the
> > interface/implementation
> > > of
> > > > > > > > > GPUManager
> > > > > > > > > > > and
> > > > > > > > > > > > > > Public
> > > > > > > > > > > > > > > > > > > API.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan
> > > Ewen <
> > > > > > > > > > > > sewen@apache.org
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > It sounds fine to initially start with
> > > GPU
> > > > > > > specific
> > > > > > > > > > > support
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > generalizing this once we better
> > > understand
> > > > > the
> > > > > > > > > space.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > About the implementation suggested in
> > > > > FLIP-108:
> > > > > > > > > > > > > > > > > > > > - Can we somehow keep this out of the
> > > > > TaskManager
> > > > > > > > > > > services?
> > > > > > > > > > > > > > > > Anything
> > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > have to pull through all layers of the
> > TM
> > > > > makes
> > > > > > > the
> > > > > > > > > TM
> > > > > > > > > > > > > > components
> > > > > > > > > > > > > > > > yet
> > > > > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > - What parts need information about
> > this?
> > > > > > > > > > > > > > > > > > > > -> do the slot profiles need
> > information
> > > > > about
> > > > > > > the
> > > > > > > > > GPU?
> > > > > > > > > > > > > > > > > > > > -> Can the GPU Manager be a "self
> > > contained"
> > > > > > > thing
> > > > > > > > > that
> > > > > > > > > > > > > simply
> > > > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > > > the configuration, and then abstracts
> > > > > everything
> > > > > > > > > > > > internally?
> > > > > > > > > > > > > > > > > Operators
> > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze
> > > Guo <
> > > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > > > > > > Regarding the WebUI and GPUInfo,
> > you're
> > > > > right,
> > > > > > > > > I'll add
> > > > > > > > > > > > > them
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > > > > > > > Regarding the general extended
> > resource
> > > > > > > > mechanism,
> > > > > > > > > I
> > > > > > > > > > > > second
> > > > > > > > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > > > > > > > - It's better to leverage
> > > ResourceProfile
> > > > > and
> > > > > > > > > > > > ResourceSpec
> > > > > > > > > > > > > > > after
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > supporting fine-grained GPU
> > > scheduling. As
> > > > > a
> > > > > > > > first
> > > > > > > > > step
> > > > > > > > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > > > > > > > prefer to not include it in the scope
> > > of
> > > > > this
> > > > > > > > FLIP.
> > > > > > > > > > > > > > > > > > > > > - Regarding the "Extended Resource
> > > > > Manager",
> > > > > > > if I
> > > > > > > > > > > > > understand
> > > > > > > > > > > > > > > > > > > > > correctly, it just a code refactoring
> > > atm,
> > > > > we
> > > > > > > > could
> > > > > > > > > > > > extract
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > open/close/allocateExtendResources of
> > > > > > > GPUManager
> > > > > > > > to
> > > > > > > > > > > that
> > > > > > > > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > > > > > > > that is the case, +1 to do it during
> > > > > > > > > implementation.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > > > > > > > As Xintong said, we looked into how
> > > Spark
> > > > > > > > supports
> > > > > > > > > a
> > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > Resource Scheduling" before and
> > > decided to
> > > > > > > > > introduce a
> > > > > > > > > > > > > common
> > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > > > > > > > to make it more extensible. I think
> > the
> > > > > > > > "resource"
> > > > > > > > > is a
> > > > > > > > > > > > > > proper
> > > > > > > > > > > > > > > > > level
> > > > > > > > > > > > > > > > > > > > > to contain all the configs of
> > extended
> > > > > > > resources.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM
> > Xingbo
> > > > > Huang <
> > > > > > > > > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > There is no doubt that GPU resource
> > > > > > > management
> > > > > > > > > > > support
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > > > > > > > facilitate the development of
> > > AI-related
> > > > > > > > > applications
> > > > > > > > > > > > by
> > > > > > > > > > > > > > > > PyFlink
> > > > > > > > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I have only one comment about this
> > > wiki:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Regarding the names of several GPU
> > > > > > > > > configurations, I
> > > > > > > > > > > > > think
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > delete the resource field makes it
> > > > > consistent
> > > > > > > > > with
> > > > > > > > > > > the
> > > > > > > > > > > > > > names
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > > resource-related configurations in
> > > > > > > > > TaskManagerOption.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > taskmanager.resource.gpu.discovery-script.path
> > > > > > > > > > > ->
> > > > > > > > > > > > > > > > > > > > > >
> > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Xintong Song <
> > tonysong820@gmail.com>
> > > > > > > > > 于2020年3月4日周三
> > > > > > > > > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Actually, Yangze, Yang and I also
> > > had
> > > > > an
> > > > > > > > > offline
> > > > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > > > > > > > the "GPU Support" as some general
> > > > > "Extended
> > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > Support".
> > > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > > > > > > > supporting extended resources in
> > a
> > > > > general
> > > > > > > > > > > mechanism
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > > > and extensible way. The reason we
> > > > > propose
> > > > > > > > this
> > > > > > > > > FLIP
> > > > > > > > > > > > > > > narrowing
> > > > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > > > > > > down to GPU alone, is mainly for
> > > the
> > > > > > > concern
> > > > > > > > on
> > > > > > > > > > > extra
> > > > > > > > > > > > > > > efforts
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > > > > > > > capacity needed for a general
> > > > > mechanism.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > To come up with a well design on
> > a
> > > > > general
> > > > > > > > > extended
> > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > > > mechanism, we would need to
> > > investigate
> > > > > > > more
> > > > > > > > > on how
> > > > > > > > > > > > > > people
> > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > > > > kind of resources in practice.
> > For
> > > > > GPU, we
> > > > > > > > > learnt
> > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > knowledge
> > > > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > experts, Becket and his team
> > > members.
> > > > > But
> > > > > > > for
> > > > > > > > > FPGA,
> > > > > > > > > > > > or
> > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > > > > > > extended resources, we don't have
> > > such
> > > > > > > > > convenient
> > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > > > > > > > making the investigation requires
> > > more
> > > > > > > > efforts,
> > > > > > > > > > > > which I
> > > > > > > > > > > > > > > tend
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On the other hand, we also looked
> > > into
> > > > > how
> > > > > > > > > Spark
> > > > > > > > > > > > > > supports a
> > > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > > > Resource Scheduling". Assuming we
> > > want
> > > > > to
> > > > > > > > have
> > > > > > > > > a
> > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > > resource mechanism in the future,
> > > we
> > > > > > > believe
> > > > > > > > > that
> > > > > > > > > > > the
> > > > > > > > > > > > > > > current
> > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > > > > design can be easily extended, in
> > > an
> > > > > > > > > incremental
> > > > > > > > > > > way
> > > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > - The most important part is
> > > probably
> > > > > user
> > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > > > > > > > configuration options to define
> > the
> > > > > amount,
> > > > > > > > > > > discovery
> > > > > > > > > > > > > > > script
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > > > > > > > k8s) in a per resource type bias
> > > [1],
> > > > > which
> > > > > > > > is
> > > > > > > > > very
> > > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > proposed in this FLIP. I think
> > > it's not
> > > > > > > > > necessary
> > > > > > > > > > > to
> > > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > > > in the general way atm, since we
> > > do not
> > > > > > > have
> > > > > > > > > > > supports
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > > types now. If later we decided to
> > > have
> > > > > per
> > > > > > > > > resource
> > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > > > > > > > can have backwards compatibility
> > > on the
> > > > > > > > current
> > > > > > > > > > > > > proposed
> > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > > > > > > > - For the GPU Manager, if later
> > > needed
> > > > > we
> > > > > > > can
> > > > > > > > > > > change
> > > > > > > > > > > > it
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > > > > > > > Resource Manager" (or whatever it
> > > is
> > > > > > > called).
> > > > > > > > > That
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > > > > > > > > > > > - For ResourceProfile and
> > > ResourceSpec,
> > > > > > > there
> > > > > > > > > are
> > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > > > > > > > general extended resource. We can
> > > of
> > > > > course
> > > > > > > > > > > leverage
> > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > > > > > > > fine grained GPU scheduling. That
> > > is
> > > > > also
> > > > > > > not
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > > > > > > step proposal, and would require
> > > > > FLIP-56 to
> > > > > > > > be
> > > > > > > > > > > > finished
> > > > > > > > > > > > > > > > first.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > To summary up, I agree with
> > Becket
> > > that
> > > > > > > have
> > > > > > > > a
> > > > > > > > > > > > separate
> > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > general extended resource
> > > mechanism,
> > > > > and
> > > > > > > keep
> > > > > > > > > it in
> > > > > > > > > > > > > mind
> > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM
> > > Becket
> > > > > Qin <
> > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > That's a good point, Stephan.
> > It
> > > > > makes
> > > > > > > > total
> > > > > > > > > > > sense
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > generalize
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > resource management to support
> > > custom
> > > > > > > > > resources.
> > > > > > > > > > > > > Having
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > > > > > to add new resources by
> > > themselves.
> > > > > The
> > > > > > > > > general
> > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > 1. The custom resource type
> > > > > definition.
> > > > > > > It
> > > > > > > > is
> > > > > > > > > > > > > supported
> > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > > > resources in ResourceProfile
> > and
> > > > > > > > > ResourceSpec.
> > > > > > > > > > > This
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > 2. The custom resource
> > allocation
> > > > > logic,
> > > > > > > > > i.e. how
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > assign
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > > > > > > to different tasks, operators,
> > > and
> > > > > so on.
> > > > > > > > > This
> > > > > > > > > > > may
> > > > > > > > > > > > > > > require
> > > > > > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > > > > > > > a. Subtask level - make sure
> > the
> > > > > subtasks
> > > > > > > > > are put
> > > > > > > > > > > > > into
> > > > > > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > > > > > > > It is done by the global RM and
> > > is
> > > > > not
> > > > > > > > > > > customizable
> > > > > > > > > > > > > > right
> > > > > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > > > > b. Operator level - map the
> > exact
> > > > > > > resource
> > > > > > > > > to the
> > > > > > > > > > > > > > > operators
> > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for
> > > > > operator
> > > > > > > B.
> > > > > > > > > This
> > > > > > > > > > > > step
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > needed
> > > > > > > > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > > > > > > > the global RM does not
> > > distinguish
> > > > > > > > individual
> > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > > > > > > > It is true for memory, but not
> > > for
> > > > > GPU.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > The GPU manager is designed to
> > > do 2.b
> > > > > > > here.
> > > > > > > > > So it
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > > > > > > > physical GPU information and
> > > > > bind/match
> > > > > > > > them
> > > > > > > > > to
> > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > > general will fill in the
> > missing
> > > > > piece to
> > > > > > > > > support
> > > > > > > > > > > > > > custom
> > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > > > definition. But I'd avoid
> > > calling it
> > > > > a
> > > > > > > > > "External
> > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > > > > > > confusion with RM, maybe
> > > something
> > > > > like
> > > > > > > > > "Operator
> > > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > be more accurate. So for each
> > > > > resource
> > > > > > > type
> > > > > > > > > users
> > > > > > > > > > > > can
> > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > > > > > > > "Operator Resource Assigner" in
> > > the
> > > > > TM.
> > > > > > > For
> > > > > > > > > > > memory,
> > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > > > > > > > but for other extended
> > resources,
> > > > > users
> > > > > > > may
> > > > > > > > > need
> > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Personally I think a pluggable
> > > > > "Operator
> > > > > > > > > Resource
> > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > > > > > > > in this FLIP. But I am also OK
> > > with
> > > > > > > having
> > > > > > > > > that
> > > > > > > > > > > in
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > > > > the interface between the
> > > "Operator
> > > > > > > > Resource
> > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > take a while to settle down if
> > we
> > > > > want to
> > > > > > > > > make it
> > > > > > > > > > > > > > > generic.
> > > > > > > > > > > > > > > > > But I
> > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > > > > > > implementation should take this
> > > > > future
> > > > > > > work
> > > > > > > > > into
> > > > > > > > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > > > > > > > don't need to break backwards
> > > > > > > compatibility
> > > > > > > > > once
> > > > > > > > > > > we
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM
> > > > > Stephan
> > > > > > > > Ewen
> > > > > > > > > <
> > > > > > > > > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thank you for writing this
> > > FLIP.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > I cannot really give much
> > input
> > > > > into
> > > > > > > the
> > > > > > > > > > > > mechanics
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > > > > > > > and GPU allocation, as I have
> > > no
> > > > > > > > experience
> > > > > > > > > > > with
> > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > One thought I had when
> > reading
> > > the
> > > > > > > > > proposal is
> > > > > > > > > > > if
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > > > > > > > the "GPU Manager" as an
> > > "External
> > > > > > > > Resource
> > > > > > > > > > > > > Manager",
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > > > > > > > The way I understand the
> > > > > > > ResourceProfile
> > > > > > > > > and
> > > > > > > > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > > > > > > > It has the advantage that it
> > > looks
> > > > > more
> > > > > > > > > > > > extensible.
> > > > > > > > > > > > > > > Maybe
> > > > > > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > Resource, a specialized
> > NVIDIA
> > > GPU
> > > > > > > > > Resource,
> > > > > > > > > > > and
> > > > > > > > > > > > > FPGA
> > > > > > > > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57
> > AM
> > > > > Becket
> > > > > > > > Qin <
> > > > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze.
> > > GPU
> > > > > > > > resource
> > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > > > > > > > for machine learning use
> > > cases.
> > > > > > > > Actually
> > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > > > one
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > > > > > > > question from the users who
> > > are
> > > > > > > > > interested in
> > > > > > > > > > > > > using
> > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Some quick comments /
> > > questions
> > > > > to
> > > > > > > the
> > > > > > > > > wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API
> > > should
> > > > > > > probably
> > > > > > > > > also
> > > > > > > > > > > be
> > > > > > > > > > > > > > > > > mentioned in
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > > > > > > > 2. Is the data structure
> > that
> > > > > holds
> > > > > > > GPU
> > > > > > > > > info
> > > > > > > > > > > > > also a
> > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at
> > 10:15
> > > AM
> > > > > > > Xintong
> > > > > > > > > Song
> > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for drafting the
> > > FLIP
> > > > > and
> > > > > > > > > kicking
> > > > > > > > > > > off
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Big +1 for this feature.
> > > > > Supporting
> > > > > > > > > using
> > > > > > > > > > > of
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > > > > > > > especially for the ML
> > > > > scenarios.
> > > > > > > > > > > > > > > > > > > > > > > > > > > I've reviewed the FLIP
> > wiki
> > > > > doc and
> > > > > > > > it
> > > > > > > > > > > looks
> > > > > > > > > > > > > good
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > me. I
> > > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > > > > > > > very good first step for
> > > > > Flink's
> > > > > > > GPU
> > > > > > > > > > > > supports.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at
> > > 12:06 PM
> > > > > > > > Yangze
> > > > > > > > > Guo
> > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > We would like to start
> > a
> > > > > > > discussion
> > > > > > > > > > > thread
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > This FLIP mainly
> > > discusses
> > > > > the
> > > > > > > > > following
> > > > > > > > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > - Enable user to
> > > configure
> > > > > how
> > > > > > > many
> > > > > > > > > GPUs
> > > > > > > > > > > > in a
> > > > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > > > forward such
> > > requirements to
> > > > > the
> > > > > > > > > external
> > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos
> > > > > setups).
> > > > > > > > > > > > > > > > > > > > > > > > > > > > - Provide information
> > of
> > > > > > > available
> > > > > > > > > GPU
> > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Key changes proposed in
> > > the
> > > > > FLIP
> > > > > > > > are
> > > > > > > > > as
> > > > > > > > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU resource
> > > > > > > requirements
> > > > > > > > > to
> > > > > > > > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce GPUManager
> > as
> > > > > one of
> > > > > > > > the
> > > > > > > > > task
> > > > > > > > > > > > > > manager
> > > > > > > > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > > > > > > > and expose GPU resource
> > > > > > > information
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > > > > context
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce the default
> > > > > script
> > > > > > > for
> > > > > > > > > GPU
> > > > > > > > > > > > > > discovery,
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > > > > > > > the privilege mode to
> > > help
> > > > > user
> > > > > > > to
> > > > > > > > > > > achieve
> > > > > > > > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Please find more
> > details
> > > in
> > > > > the
> > > > > > > > FLIP
> > > > > > > > > wiki
> > > > > > > > > > > > > > > document
> > > > > > > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Xintong Song <to...@gmail.com>.
I also agree that the pluggable ExternalResourceDriver should be loaded by
the cluster class loader. Despite the plugin might be implemented by users,
external resources (as part of task executor resources) should be cluster
configurations, unlike job-level user codes such as UDFs, because the task
executors belongs to the cluster rather than jobs.


IIUC, the concern Stephan raised is about the potential credential problem
when executing user codes on RM with cluster class loader. The concern
makes sense to me, and I think what Yangze suggested should be a good
approach trying to prevent such credential problems. The only purpose we
tried to execute user codes (i.e. getKubernetes/YarnExternalResource) on RM
was that, we need to set these key-value pairs to pod/container requests.
Replacing the interfaces getKubernetes/YarnExternalResource with
configuration options
'external-resource.{resourceName}.yarn/kubernetes.key/amount',
we can still fulfill that purpose, without the credential risks.


Thank you~

Xintong Song



On Mon, Mar 30, 2020 at 5:17 PM Till Rohrmann <tr...@apache.org> wrote:

> At the moment the RM does not have a user code class loader and I agree
> with Stephan that it should stay like this. This, however, does not mean
> that we cannot support pluggable components in the RM. As long as the
> plugins are on the system's class path, it should be fine for the RM to
> load them. For example, we could add external resources via Flink's plugin
> mechanism or something similar.
>
> A very simple implementation of such an ExternalResourceDriver could be a
> class which simply returns what is written in the flink-conf.yaml under a
> given key.
>
> Cheers,
> Till
>
> On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo <ka...@gmail.com> wrote:
>
> > Hi, Stephan,
> >
> > I see your concern and I totally agree with you.
> >
> > The interface on RM side is now `Map<String key, String/Long value>
> > getYarn/KubernetesExternalResource()`. The only valid information RM
> > get from it is the configuration key of that external resource in
> > Yarn/K8s. The "String/Long value" would be the same as the
> > external-resource.{resourceName}.amount.
> > So, I think it makes sense to replace these two interfaces with two
> > configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key. We
> > may lose some extensibility, but AFAIK it could work with common
> > external resources like GPU, FPGA. WDYT?
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen <se...@apache.org> wrote:
> > >
> > > Maybe one final comment: It is probably not an issue, but let's try and
> > > keep user code (via user code classloader) out of the ResourceManager,
> if
> > > possible.
> > >
> > > As background:
> > >
> > > There were thoughts in the past to support setups where the RM must run
> > > with "superuser" credentials, but we cannot run JM/TM with these
> > > credentials, as the user code might access them otherwise.
> > > This is actually possible today, you can run the RM in a different JVM
> or
> > > in a different container, and give it more credentials than JMs / TMs.
> > But
> > > for this to be feasible, we cannot allow any user-defined code to be in
> > the
> > > JVM, because that instantaneously breaks the isolation of credentials.
> > >
> > >
> > >
> > > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo <ka...@gmail.com> wrote:
> > >
> > > > Thanks for the feedback, @Till and @Xintong.
> > > >
> > > > Regarding separating the interface, I'm also +1 with it.
> > > >
> > > > Regarding the resource allocation interface, true, it's dangerous to
> > > > give much access to user codes. Changing the return type to
> Map<String
> > > > key, String/Long value> makes sense to me. AFAIK, it is compatible
> > > > with all the first-party supported resources for Yarn/Kubernetes. It
> > > > could also free us from the potential dependency issue as well.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song <tonysong820@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > Thanks for updating the FLIP, Yangze.
> > > > >
> > > > > I agree with Till that we probably want to separate the K8s/Yarn
> > > > decorator
> > > > > calls. Users can still configure one driver class, and we can use
> > > > > `instanceof` to check whether the driver implemented K8s/Yarn
> > specific
> > > > > interfaces.
> > > > >
> > > > > Moreover, I'm not sure about exposing entire `ContainerRequest` /
> > `Pod`
> > > > > (`AbstractKubernetesStepDecorator` directly manipulates on `Pod`)
> to
> > user
> > > > > codes. It gives more access to user codes than needed for defining
> > > > external
> > > > > resource, which might cause problems. Instead, I would suggest to
> > have
> > > > > interface like `Map<String key, String value>
> > > > > getYarn/KubernetesExternalResource()` and assemble them into
> > > > > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <
> trohrmann@apache.org>
> > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I'm a bit late to the party. I think the current proposal looks
> > good.
> > > > > >
> > > > > > Concerning the ExternalResourceDriver interface defined in the
> FLIP
> > > > [1], I
> > > > > > would suggest to not include the decorator calls for Kubernetes
> and
> > > > Yarn in
> > > > > > the base interface. Instead I would suggest to segregate the
> > deployment
> > > > > > specific decorator calls into separate interfaces. That way an
> > > > > > ExternalResourceDriver does not have to support all deployments
> > from
> > > > the
> > > > > > very beginning. Moreover, some resources might not be supported
> by
> > a
> > > > > > specific deployment target and the natural way to express this
> > would
> > > > be to
> > > > > > not implement the respective deployment specific interface.
> > > > > >
> > > > > > Moreover, having void
> > > > > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> > > > containerRequest)
> > > > > > in the ExternalResourceDriver interface would require Hadoop on
> > Flink's
> > > > > > classpath whenever the external resource driver is being used.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <se...@apache.org>
> > > > wrote:
> > > > > >
> > > > > > > Nice, thanks a lot!
> > > > > > >
> > > > > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <
> karmagyz@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the suggestion, @Stephan, @Becket and @Xintong.
> > > > > > > >
> > > > > > > > I've updated the FLIP accordingly. I do not add a
> > > > > > > > ResourceInfoProvider. Instead, I introduce the
> > > > ExternalResourceDriver,
> > > > > > > > which takes the responsibility of all relevant operations on
> > both
> > > > RM
> > > > > > > > and TM sides.
> > > > > > > > After a rethink about decoupling the management of external
> > > > resources
> > > > > > > > from TaskExecutor, I think we could do the same thing on the
> > > > > > > > ResourceManager side. We do not need to add a specific
> > allocation
> > > > > > > > logic to the ResourceManager each time we add a specific
> > external
> > > > > > > > resource.
> > > > > > > > - For Yarn, we need the ExternalResourceDriver to edit the
> > > > > > > > containerRequest.
> > > > > > > > - For Kubenetes, ExternalResourceDriver could provide a
> > decorator
> > > > for
> > > > > > > > the TM pod.
> > > > > > > >
> > > > > > > > In this way, just like MetricReporter, we allow users to
> define
> > > > their
> > > > > > > > custom ExternalResourceDriver. It is more extensible and fits
> > the
> > > > > > > > separation of concerns. For more details, please take a look
> at
> > > > [1].
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <
> sewen@apache.org
> > >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > This sounds good to go ahead from my side.
> > > > > > > > >
> > > > > > > > > I like the approach that Becket suggested - in that case
> the
> > core
> > > > > > > > > abstraction that everyone would need to understand would be
> > > > "external
> > > > > > > > > resource allocation" and the "ResourceInfoProvider", and
> the
> > GPU
> > > > > > > specific
> > > > > > > > > code would be a specific implementation only known to that
> > > > component
> > > > > > > that
> > > > > > > > > allocates the external resource. That fits the separation
> of
> > > > concerns
> > > > > > > > well.
> > > > > > > > >
> > > > > > > > > I also understand that it should not be over-engineered in
> > the
> > > > first
> > > > > > > > > version, so some simplification makes sense, and then
> > gradually
> > > > > > expand
> > > > > > > > from
> > > > > > > > > there.
> > > > > > > > >
> > > > > > > > > So +1 to go ahead with what was suggested above (Xintong /
> > > > Becket)
> > > > > > from
> > > > > > > > my
> > > > > > > > > side.
> > > > > > > > >
> > > > > > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <
> > > > tonysong820@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the comments, Stephan & Becket.
> > > > > > > > > >
> > > > > > > > > > @Stephan
> > > > > > > > > >
> > > > > > > > > > I see your concern, and I completely agree with you that
> we
> > > > should
> > > > > > > > first
> > > > > > > > > > think about the "library" / "plugin" / "extension" style
> if
> > > > > > possible.
> > > > > > > > > >
> > > > > > > > > > If GPUs are sliced and assigned during scheduling, there
> > may be
> > > > > > > reason,
> > > > > > > > > > > although it looks that it would belong to the slot
> then.
> > Is
> > > > that
> > > > > > > > what we
> > > > > > > > > > > are doing here?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > In the current proposal, we do not have the GPUs sliced
> and
> > > > > > assigned
> > > > > > > to
> > > > > > > > > > slots, because it could be problematic without dynamic
> slot
> > > > > > > allocation.
> > > > > > > > > > E.g., the number of GPUs might not be evenly divisible by
> > the
> > > > > > number
> > > > > > > of
> > > > > > > > > > slots.
> > > > > > > > > >
> > > > > > > > > > I think it makes sense to eventually have the GPUs
> > assigned to
> > > > > > slots.
> > > > > > > > Even
> > > > > > > > > > then, we might still need a TM level GPUManager (or
> > > > > > ResourceProvider
> > > > > > > > like
> > > > > > > > > > Becket suggested). For memory, in each slot we can simply
> > > > request
> > > > > > the
> > > > > > > > > > amount of memory, leaving it to JVM / OS to decide which
> > memory
> > > > > > > > (address)
> > > > > > > > > > should be assigned. For GPU, and potentially other
> > resources
> > > > like
> > > > > > > > FPGA, we
> > > > > > > > > > need to explicitly specify which GPU (index) should be
> > used.
> > > > > > > > Therefore, we
> > > > > > > > > > need some component at the TM level to coordinate which
> > slot
> > > > uses
> > > > > > > which
> > > > > > > > > > GPU.
> > > > > > > > > >
> > > > > > > > > > IMO, unless we say Flink will not support slot-level GPU
> > > > slicing at
> > > > > > > > least
> > > > > > > > > > in the foreseeable future, I don't see a good way to
> avoid
> > > > touching
> > > > > > > > the TM
> > > > > > > > > > core. To that end, I think Becket's suggestion points to
> a
> > good
> > > > > > > > direction,
> > > > > > > > > > that supports more features (GPU, FPGA, etc.) with less
> > > > coupling to
> > > > > > > > the TM
> > > > > > > > > > core (only needs to understand the general interfaces).
> The
> > > > > > detailed
> > > > > > > > > > implementation for specific resource types can even be
> > > > encapsulated
> > > > > > > as
> > > > > > > > a
> > > > > > > > > > library.
> > > > > > > > > >
> > > > > > > > > > @Becket
> > > > > > > > > >
> > > > > > > > > > Thanks for sharing your thought on the final state.
> > Despite the
> > > > > > > > details how
> > > > > > > > > > the interfaces should look like, I think this is a really
> > good
> > > > > > > > abstraction
> > > > > > > > > > for supporting general resource types.
> > > > > > > > > >
> > > > > > > > > > I'd like to further clarify that, the following three
> > things
> > > > are
> > > > > > all
> > > > > > > > that
> > > > > > > > > > the "Flink core" needs to understand.
> > > > > > > > > >
> > > > > > > > > >    - The *amount* of resource, for scheduling. Actually,
> we
> > > > already
> > > > > > > > have
> > > > > > > > > >    the Resource class in ResourceProfile and ResourceSpec
> > for
> > > > > > > extended
> > > > > > > > > >    resource. It's just not really used.
> > > > > > > > > >    - The *info*, that Flink provides to the operators /
> > user
> > > > codes.
> > > > > > > > > >    - The *provider*, which generates the info based on
> the
> > > > amount.
> > > > > > > > > >
> > > > > > > > > > The "core" does not need to understand the specific
> > > > implementation
> > > > > > > > details
> > > > > > > > > > of the above three. They can even be implemented in a
> > 3rd-party
> > > > > > > > library.
> > > > > > > > > > Similar to how we allow users to define their custom
> > > > > > MetricReporter.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <
> > > > becket.qin@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the comment, Stephan.
> > > > > > > > > > >
> > > > > > > > > > >   - If everything becomes a "core feature", it will
> make
> > the
> > > > > > > project
> > > > > > > > hard
> > > > > > > > > > > > to develop in the future. Thinking "library" /
> > "plugin" /
> > > > > > > > "extension"
> > > > > > > > > > > style
> > > > > > > > > > > > where possible helps.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Completely agree. It is much more important to design a
> > > > mechanism
> > > > > > > > than
> > > > > > > > > > > focusing on a specific case. Here is what I am thinking
> > to
> > > > fully
> > > > > > > > support
> > > > > > > > > > > custom resource management:
> > > > > > > > > > > 1. On the JM / RM side, use ResourceProfile and
> > ResourceSpec
> > > > to
> > > > > > > > define
> > > > > > > > > > the
> > > > > > > > > > > resource and the amount required. They will be used to
> > find
> > > > > > > suitable
> > > > > > > > TMs
> > > > > > > > > > > slots to run the tasks. At this point, the resources
> are
> > only
> > > > > > > > measured by
> > > > > > > > > > > amount, i.e. they do not have individual ID.
> > > > > > > > > > >
> > > > > > > > > > > 2. On the TM side, have something like
> > > > *"ResourceInfoProvider"*
> > > > > > to
> > > > > > > > > > identify
> > > > > > > > > > > and provides the detail information of the individual
> > > > resource,
> > > > > > > e.g.
> > > > > > > > GPU
> > > > > > > > > > > ID.. It is important because the operator may have to
> > > > explicitly
> > > > > > > > interact
> > > > > > > > > > > with the physical resource it uses. The
> > ResourceInfoProvider
> > > > > > might
> > > > > > > > look
> > > > > > > > > > > like something below.
> > > > > > > > > > > interface ResourceInfoProvider<INFO> {
> > > > > > > > > > >     Map<AbstractID, INFO>
> retrieveResourceInfo(OperatorId
> > > > opId,
> > > > > > > > > > > ResourceProfile resourceProfile);
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > - There could be several "*ResourceInfoProvider*"
> > configured
> > > > on
> > > > > > the
> > > > > > > > TM to
> > > > > > > > > > > retrieve the information for different resources.
> > > > > > > > > > > - The TM will be responsible to assign those individual
> > > > resources
> > > > > > > to
> > > > > > > > each
> > > > > > > > > > > operator according to their requested amount.
> > > > > > > > > > > - The operators will be able to get the ResourceInfo
> from
> > > > their
> > > > > > > > > > > RuntimeContext.
> > > > > > > > > > >
> > > > > > > > > > > If we agree this is a reasonable final state. We can
> > adapt
> > > > the
> > > > > > > > current
> > > > > > > > > > FLIP
> > > > > > > > > > > to it. In fact it does not sound a big change to me.
> All
> > the
> > > > > > > proposed
> > > > > > > > > > > configuration can be as is, it is just that Flink
> itself
> > > > won't
> > > > > > care
> > > > > > > > about
> > > > > > > > > > > them, instead a GPUInfoProviver implementing the
> > > > > > > ResourceInfoProvider
> > > > > > > > > > will
> > > > > > > > > > > use them.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <
> > > > sewen@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi all!
> > > > > > > > > > > >
> > > > > > > > > > > > The main point I wanted to throw into the discussion
> > is the
> > > > > > > > following:
> > > > > > > > > > > >   - With more and more use cases, more and more tools
> > go
> > > > into
> > > > > > > Flink
> > > > > > > > > > > >   - If everything becomes a "core feature", it will
> > make
> > > > the
> > > > > > > > project
> > > > > > > > > > hard
> > > > > > > > > > > > to develop in the future. Thinking "library" /
> > "plugin" /
> > > > > > > > "extension"
> > > > > > > > > > > style
> > > > > > > > > > > > where possible helps.
> > > > > > > > > > > >
> > > > > > > > > > > >   - A good thought experiment is always: How many
> > future
> > > > > > > developers
> > > > > > > > > > have
> > > > > > > > > > > to
> > > > > > > > > > > > interact with this code (and possibly understand it
> > > > partially),
> > > > > > > > even if
> > > > > > > > > > > the
> > > > > > > > > > > > features they touch have nothing to do with GPU
> > support. If
> > > > > > many
> > > > > > > > > > > > contributors to unrelated features will have to touch
> > it
> > > > and
> > > > > > > > understand
> > > > > > > > > > > it,
> > > > > > > > > > > > then let's think if there is a different solution.
> > Maybe
> > > > there
> > > > > > is
> > > > > > > > not,
> > > > > > > > > > > but
> > > > > > > > > > > > then we should be sure why.
> > > > > > > > > > > >
> > > > > > > > > > > >   - That led me to raising this issue: If the GPU
> > manager
> > > > > > > becomes a
> > > > > > > > > > core
> > > > > > > > > > > > service in the TaskManager, Environment,
> > RuntimeContext,
> > > > etc.
> > > > > > > then
> > > > > > > > > > > everyone
> > > > > > > > > > > > developing TM and streaming tasks need to understand
> > the
> > > > GPU
> > > > > > > > manager.
> > > > > > > > > > > That
> > > > > > > > > > > > seems oddly specific, is my impression.
> > > > > > > > > > > >
> > > > > > > > > > > > Access to configuration seems not the right reason to
> > do
> > > > that.
> > > > > > We
> > > > > > > > > > should
> > > > > > > > > > > > expose the Flink configuration from the
> RuntimeContext
> > > > anyways.
> > > > > > > > > > > >
> > > > > > > > > > > > If GPUs are sliced and assigned during scheduling,
> > there
> > > > may be
> > > > > > > > reason,
> > > > > > > > > > > > although it looks that it would belong to the slot
> > then. Is
> > > > > > that
> > > > > > > > what
> > > > > > > > > > we
> > > > > > > > > > > > are doing here?
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Stephan
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > > > > > > >
> > > > > > > > > > > > > IMO, eventually an operator should only see info of
> > GPUs
> > > > that
> > > > > > > are
> > > > > > > > > > > > dedicated
> > > > > > > > > > > > > for it, instead of all GPUs on the
> machine/container
> > in
> > > > the
> > > > > > > > current
> > > > > > > > > > > > design.
> > > > > > > > > > > > > It does not make sense to let the user who writes a
> > UDF
> > > > to
> > > > > > > worry
> > > > > > > > > > about
> > > > > > > > > > > > > coordination among multiple operators running on
> the
> > same
> > > > > > > > machine.
> > > > > > > > > > And
> > > > > > > > > > > if
> > > > > > > > > > > > > we want to limit the GPU info an operator sees, we
> > > > should not
> > > > > > > > let the
> > > > > > > > > > > > > operator to instantiate GPUManager, which means we
> > have
> > > > to
> > > > > > > expose
> > > > > > > > > > > > something
> > > > > > > > > > > > > through runtime context, either GPU info or some
> > kind of
> > > > > > > limited
> > > > > > > > > > access
> > > > > > > > > > > > to
> > > > > > > > > > > > > the GPUManager.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <
> > > > > > > becket.qin@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > It probably make sense for us to first agree on
> the
> > > > final
> > > > > > > > state.
> > > > > > > > > > More
> > > > > > > > > > > > > > specifically, will the resource info be exposed
> > through
> > > > > > > runtime
> > > > > > > > > > > context
> > > > > > > > > > > > > > eventually?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If that is the final state and we have a seamless
> > > > migration
> > > > > > > > story
> > > > > > > > > > > from
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > FLIP to that final state, Personally I think it
> is
> > OK
> > > > to
> > > > > > > > expose the
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > info in the runtime context.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> > > > > > > > > > tonysong820@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Yangze,
> > > > > > > > > > > > > > > I think what Stephan means (@Stephan, please
> > correct
> > > > me
> > > > > > if
> > > > > > > > I'm
> > > > > > > > > > > wrong)
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > that, we might not need to hold and maintain
> the
> > > > > > GPUManager
> > > > > > > > as a
> > > > > > > > > > > > > service
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > TaskManagerServices or RuntimeContext. An
> > > > alternative is
> > > > > > to
> > > > > > > > > > create
> > > > > > > > > > > /
> > > > > > > > > > > > > > > retrieve the GPUManager only in the operators
> > that
> > > > need
> > > > > > it,
> > > > > > > > e.g.,
> > > > > > > > > > > > with
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Stephan,
> > > > > > > > > > > > > > > I agree with you on excluding GPUManager from
> > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >    - For the first step, where we provide
> unified
> > > > > > TM-level
> > > > > > > > GPU
> > > > > > > > > > > > > > information
> > > > > > > > > > > > > > >    to all operators, it should be fine to have
> > > > operators
> > > > > > > > access /
> > > > > > > > > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > > > > > > > > >    - In future, we might have some more
> > fine-grained
> > > > GPU
> > > > > > > > > > > management,
> > > > > > > > > > > > > > where
> > > > > > > > > > > > > > >    we need to maintain GPUManager as a service
> > and
> > > > put
> > > > > > GPU
> > > > > > > > info
> > > > > > > > > > in
> > > > > > > > > > > > slot
> > > > > > > > > > > > > > >    profiles. But at least for now it's not
> > necessary
> > > > to
> > > > > > > > introduce
> > > > > > > > > > > > such
> > > > > > > > > > > > > > >    complexity.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > However, I have some concerns on excluding
> > GPUManager
> > > > > > from
> > > > > > > > > > > > > RuntimeContext
> > > > > > > > > > > > > > > and let operators access it directly.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >    - Configurations needed for creating the
> > > > GPUManager is
> > > > > > > not
> > > > > > > > > > > always
> > > > > > > > > > > > > > >    available for operators.
> > > > > > > > > > > > > > >    - If later we want to have fine-grained
> > control
> > > > over
> > > > > > GPU
> > > > > > > > > > (e.g.,
> > > > > > > > > > > > > > >    operators in each slot can only see GPUs
> > reserved
> > > > for
> > > > > > > that
> > > > > > > > > > > slot),
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I would suggest to wrap the GPUManager behind
> > > > > > > RuntimeContext
> > > > > > > > and
> > > > > > > > > > > only
> > > > > > > > > > > > > > > expose the GPUInfo to users. For now, we can
> > declare
> > > > a
> > > > > > > method
> > > > > > > > > > > > > > > `getGPUInfo()` in RuntimeContext, with a
> default
> > > > > > definition
> > > > > > > > that
> > > > > > > > > > > > calls
> > > > > > > > > > > > > > > `GPUManager.get()` to get the lazily-created
> > > > GPUManager.
> > > > > > If
> > > > > > > > later
> > > > > > > > > > > we
> > > > > > > > > > > > > want
> > > > > > > > > > > > > > > to create / retrieve GPUManager in a different
> > way,
> > > > we
> > > > > > can
> > > > > > > > simply
> > > > > > > > > > > > > change
> > > > > > > > > > > > > > > how `getGPUInfo` is implemented, without
> needing
> > to
> > > > > > change
> > > > > > > > any
> > > > > > > > > > > public
> > > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <
> > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Shephan
> > > > > > > > > > > > > > > > Do you mean Minicluster? Yes, it makes sense
> to
> > > > share
> > > > > > the
> > > > > > > > GPU
> > > > > > > > > > > > Manager
> > > > > > > > > > > > > > > > in such scenario.
> > > > > > > > > > > > > > > > If that's what you worry about, I'm +1 for
> > holding
> > > > > > > > > > > > > > > > GPUManager(ExternalResourceManagers) in
> > > > TaskExecutor
> > > > > > > > instead of
> > > > > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Regarding the RuntimeContext/FunctionContext,
> > it
> > > > just
> > > > > > > > holds the
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > info instead of the GPU Manager. AFAIK, it's
> > the
> > > > only
> > > > > > > > place we
> > > > > > > > > > > > could
> > > > > > > > > > > > > > > > pass GPU info to the
> > > > RichFunction/UserDefinedFunction.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac
> Godfried
> > <
> > > > > > > > > > > > isaac@paddlesoft.net
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000
> > > > > > > sewen@apache.org
> > > > > > > > > > wrote
> > > > > > > > > > > > > ----
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Can we somehow keep this out of the
> > > > TaskManager
> > > > > > > > services
> > > > > > > > > > > > > > > > > > I fear that we could not. IMO, the
> > > > GPUManager(or
> > > > > > > > > > > > > > > > > > ExternalServicesManagers in future) is
> > > > conceptually
> > > > > > > > one of
> > > > > > > > > > > the
> > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > manager services, just like MemoryManager
> > > > before
> > > > > > > 1.10.
> > > > > > > > > > > > > > > > > > - It maintains/holds the GPU resource at
> TM
> > > > level
> > > > > > and
> > > > > > > > all
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > operators allocate the GPU resources from
> > it.
> > > > So,
> > > > > > it
> > > > > > > > should
> > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > > > > > > > > - We could add a collection called
> > > > > > > > ExternalResourceManagers
> > > > > > > > > > > to
> > > > > > > > > > > > > hold
> > > > > > > > > > > > > > > > > > all managers of other external resources
> > in the
> > > > > > > future.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can you help me understand why this needs
> the
> > > > > > addition
> > > > > > > in
> > > > > > > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > > > > > > Are you worried about the case when
> multiple
> > Task
> > > > > > > > Executors
> > > > > > > > > > run
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > JVM? That's not common, but wouldn't it
> > actually
> > > > be
> > > > > > > good
> > > > > > > > in
> > > > > > > > > > > that
> > > > > > > > > > > > > case
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > share the GPU Manager, given that the GPU
> is
> > > > shared?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ---------------------------
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > What parts need information about this?
> > > > > > > > > > > > > > > > > > In this FLIP, operators need the
> > information.
> > > > Thus,
> > > > > > > we
> > > > > > > > > > expose
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > information to the
> > > > RuntimeContext/FunctionContext.
> > > > > > > The
> > > > > > > > slot
> > > > > > > > > > > > > profile
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > not aware of GPU resources as GPU is TM
> > level
> > > > > > > resource
> > > > > > > > now.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Can the GPU Manager be a "self
> contained"
> > > > thing
> > > > > > > that
> > > > > > > > > > simply
> > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > configuration, and then abstracts
> > everything
> > > > > > > > internally?
> > > > > > > > > > > > > > > > > > Yes, we just pass the path/args of the
> > discover
> > > > > > > script
> > > > > > > > and
> > > > > > > > > > > how
> > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > GPUs per TM to it. It takes the
> > responsibility
> > > > to
> > > > > > get
> > > > > > > > the
> > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > information and expose them to the
> > > > > > > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > Operators. Meanwhile, we'd better not
> allow
> > > > > > operators
> > > > > > > > to
> > > > > > > > > > > > directly
> > > > > > > > > > > > > > > > > > access GPUManager, it should get what
> they
> > want
> > > > > > from
> > > > > > > > > > Context.
> > > > > > > > > > > > We
> > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > then decouple the
> interface/implementation
> > of
> > > > > > > > GPUManager
> > > > > > > > > > and
> > > > > > > > > > > > > Public
> > > > > > > > > > > > > > > > > > API.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan
> > Ewen <
> > > > > > > > > > > sewen@apache.org
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > It sounds fine to initially start with
> > GPU
> > > > > > specific
> > > > > > > > > > support
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > generalizing this once we better
> > understand
> > > > the
> > > > > > > > space.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > About the implementation suggested in
> > > > FLIP-108:
> > > > > > > > > > > > > > > > > > > - Can we somehow keep this out of the
> > > > TaskManager
> > > > > > > > > > services?
> > > > > > > > > > > > > > > Anything
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > have to pull through all layers of the
> TM
> > > > makes
> > > > > > the
> > > > > > > > TM
> > > > > > > > > > > > > components
> > > > > > > > > > > > > > > yet
> > > > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > - What parts need information about
> this?
> > > > > > > > > > > > > > > > > > > -> do the slot profiles need
> information
> > > > about
> > > > > > the
> > > > > > > > GPU?
> > > > > > > > > > > > > > > > > > > -> Can the GPU Manager be a "self
> > contained"
> > > > > > thing
> > > > > > > > that
> > > > > > > > > > > > simply
> > > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > > the configuration, and then abstracts
> > > > everything
> > > > > > > > > > > internally?
> > > > > > > > > > > > > > > > Operators
> > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze
> > Guo <
> > > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > > > > > Regarding the WebUI and GPUInfo,
> you're
> > > > right,
> > > > > > > > I'll add
> > > > > > > > > > > > them
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > > > > > > Regarding the general extended
> resource
> > > > > > > mechanism,
> > > > > > > > I
> > > > > > > > > > > second
> > > > > > > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > > > > > > - It's better to leverage
> > ResourceProfile
> > > > and
> > > > > > > > > > > ResourceSpec
> > > > > > > > > > > > > > after
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > supporting fine-grained GPU
> > scheduling. As
> > > > a
> > > > > > > first
> > > > > > > > step
> > > > > > > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > > > > > > prefer to not include it in the scope
> > of
> > > > this
> > > > > > > FLIP.
> > > > > > > > > > > > > > > > > > > > - Regarding the "Extended Resource
> > > > Manager",
> > > > > > if I
> > > > > > > > > > > > understand
> > > > > > > > > > > > > > > > > > > > correctly, it just a code refactoring
> > atm,
> > > > we
> > > > > > > could
> > > > > > > > > > > extract
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > open/close/allocateExtendResources of
> > > > > > GPUManager
> > > > > > > to
> > > > > > > > > > that
> > > > > > > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > > > > > > that is the case, +1 to do it during
> > > > > > > > implementation.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > > > > > > As Xintong said, we looked into how
> > Spark
> > > > > > > supports
> > > > > > > > a
> > > > > > > > > > > > general
> > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > Resource Scheduling" before and
> > decided to
> > > > > > > > introduce a
> > > > > > > > > > > > common
> > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > >
> > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > > > > > > to make it more extensible. I think
> the
> > > > > > > "resource"
> > > > > > > > is a
> > > > > > > > > > > > > proper
> > > > > > > > > > > > > > > > level
> > > > > > > > > > > > > > > > > > > > to contain all the configs of
> extended
> > > > > > resources.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM
> Xingbo
> > > > Huang <
> > > > > > > > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > There is no doubt that GPU resource
> > > > > > management
> > > > > > > > > > support
> > > > > > > > > > > > will
> > > > > > > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > > > > > > facilitate the development of
> > AI-related
> > > > > > > > applications
> > > > > > > > > > > by
> > > > > > > > > > > > > > > PyFlink
> > > > > > > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I have only one comment about this
> > wiki:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Regarding the names of several GPU
> > > > > > > > configurations, I
> > > > > > > > > > > > think
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > delete the resource field makes it
> > > > consistent
> > > > > > > > with
> > > > > > > > > > the
> > > > > > > > > > > > > names
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > > resource-related configurations in
> > > > > > > > TaskManagerOption.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > taskmanager.resource.gpu.discovery-script.path
> > > > > > > > > > ->
> > > > > > > > > > > > > > > > > > > > >
> taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Xintong Song <
> tonysong820@gmail.com>
> > > > > > > > 于2020年3月4日周三
> > > > > > > > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Actually, Yangze, Yang and I also
> > had
> > > > an
> > > > > > > > offline
> > > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > > > > > > the "GPU Support" as some general
> > > > "Extended
> > > > > > > > > > Resource
> > > > > > > > > > > > > > > Support".
> > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > > > > > > supporting extended resources in
> a
> > > > general
> > > > > > > > > > mechanism
> > > > > > > > > > > is
> > > > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > > and extensible way. The reason we
> > > > propose
> > > > > > > this
> > > > > > > > FLIP
> > > > > > > > > > > > > > narrowing
> > > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > > > > > down to GPU alone, is mainly for
> > the
> > > > > > concern
> > > > > > > on
> > > > > > > > > > extra
> > > > > > > > > > > > > > efforts
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > > > > > > capacity needed for a general
> > > > mechanism.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > To come up with a well design on
> a
> > > > general
> > > > > > > > extended
> > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > > mechanism, we would need to
> > investigate
> > > > > > more
> > > > > > > > on how
> > > > > > > > > > > > > people
> > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > > > kind of resources in practice.
> For
> > > > GPU, we
> > > > > > > > learnt
> > > > > > > > > > > such
> > > > > > > > > > > > > > > > knowledge
> > > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > experts, Becket and his team
> > members.
> > > > But
> > > > > > for
> > > > > > > > FPGA,
> > > > > > > > > > > or
> > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > > > > > extended resources, we don't have
> > such
> > > > > > > > convenient
> > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > > > > > > making the investigation requires
> > more
> > > > > > > efforts,
> > > > > > > > > > > which I
> > > > > > > > > > > > > > tend
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On the other hand, we also looked
> > into
> > > > how
> > > > > > > > Spark
> > > > > > > > > > > > > supports a
> > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > > Resource Scheduling". Assuming we
> > want
> > > > to
> > > > > > > have
> > > > > > > > a
> > > > > > > > > > > > similar
> > > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > resource mechanism in the future,
> > we
> > > > > > believe
> > > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > > > > > current
> > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > > > design can be easily extended, in
> > an
> > > > > > > > incremental
> > > > > > > > > > way
> > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > - The most important part is
> > probably
> > > > user
> > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > > > > > > configuration options to define
> the
> > > > amount,
> > > > > > > > > > discovery
> > > > > > > > > > > > > > script
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > > > > > > k8s) in a per resource type bias
> > [1],
> > > > which
> > > > > > > is
> > > > > > > > very
> > > > > > > > > > > > > similar
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > proposed in this FLIP. I think
> > it's not
> > > > > > > > necessary
> > > > > > > > > > to
> > > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > > in the general way atm, since we
> > do not
> > > > > > have
> > > > > > > > > > supports
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > > types now. If later we decided to
> > have
> > > > per
> > > > > > > > resource
> > > > > > > > > > > > type
> > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > > > > > > can have backwards compatibility
> > on the
> > > > > > > current
> > > > > > > > > > > > proposed
> > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > > > > > > - For the GPU Manager, if later
> > needed
> > > > we
> > > > > > can
> > > > > > > > > > change
> > > > > > > > > > > it
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > > > > > > Resource Manager" (or whatever it
> > is
> > > > > > called).
> > > > > > > > That
> > > > > > > > > > > > should
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > > > > > > > > > > - For ResourceProfile and
> > ResourceSpec,
> > > > > > there
> > > > > > > > are
> > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > > > > > > general extended resource. We can
> > of
> > > > course
> > > > > > > > > > leverage
> > > > > > > > > > > > them
> > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > > > > > > fine grained GPU scheduling. That
> > is
> > > > also
> > > > > > not
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > scope
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > > > > > step proposal, and would require
> > > > FLIP-56 to
> > > > > > > be
> > > > > > > > > > > finished
> > > > > > > > > > > > > > > first.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > To summary up, I agree with
> Becket
> > that
> > > > > > have
> > > > > > > a
> > > > > > > > > > > separate
> > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > general extended resource
> > mechanism,
> > > > and
> > > > > > keep
> > > > > > > > it in
> > > > > > > > > > > > mind
> > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM
> > Becket
> > > > Qin <
> > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > That's a good point, Stephan.
> It
> > > > makes
> > > > > > > total
> > > > > > > > > > sense
> > > > > > > > > > > to
> > > > > > > > > > > > > > > > generalize
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > resource management to support
> > custom
> > > > > > > > resources.
> > > > > > > > > > > > Having
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > > > > to add new resources by
> > themselves.
> > > > The
> > > > > > > > general
> > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > 1. The custom resource type
> > > > definition.
> > > > > > It
> > > > > > > is
> > > > > > > > > > > > supported
> > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > > resources in ResourceProfile
> and
> > > > > > > > ResourceSpec.
> > > > > > > > > > This
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > 2. The custom resource
> allocation
> > > > logic,
> > > > > > > > i.e. how
> > > > > > > > > > > to
> > > > > > > > > > > > > > assign
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > > > > > to different tasks, operators,
> > and
> > > > so on.
> > > > > > > > This
> > > > > > > > > > may
> > > > > > > > > > > > > > require
> > > > > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > > > > > > a. Subtask level - make sure
> the
> > > > subtasks
> > > > > > > > are put
> > > > > > > > > > > > into
> > > > > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > > > > > > It is done by the global RM and
> > is
> > > > not
> > > > > > > > > > customizable
> > > > > > > > > > > > > right
> > > > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > > > b. Operator level - map the
> exact
> > > > > > resource
> > > > > > > > to the
> > > > > > > > > > > > > > operators
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for
> > > > operator
> > > > > > B.
> > > > > > > > This
> > > > > > > > > > > step
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > needed
> > > > > > > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > > > > > > the global RM does not
> > distinguish
> > > > > > > individual
> > > > > > > > > > > > resources
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > > > > > > It is true for memory, but not
> > for
> > > > GPU.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > The GPU manager is designed to
> > do 2.b
> > > > > > here.
> > > > > > > > So it
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > > > > > > physical GPU information and
> > > > bind/match
> > > > > > > them
> > > > > > > > to
> > > > > > > > > > > each
> > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > general will fill in the
> missing
> > > > piece to
> > > > > > > > support
> > > > > > > > > > > > > custom
> > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > > definition. But I'd avoid
> > calling it
> > > > a
> > > > > > > > "External
> > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > > > > > confusion with RM, maybe
> > something
> > > > like
> > > > > > > > "Operator
> > > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > be more accurate. So for each
> > > > resource
> > > > > > type
> > > > > > > > users
> > > > > > > > > > > can
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > > > > > > "Operator Resource Assigner" in
> > the
> > > > TM.
> > > > > > For
> > > > > > > > > > memory,
> > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > > > > > > but for other extended
> resources,
> > > > users
> > > > > > may
> > > > > > > > need
> > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Personally I think a pluggable
> > > > "Operator
> > > > > > > > Resource
> > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > > > > > > in this FLIP. But I am also OK
> > with
> > > > > > having
> > > > > > > > that
> > > > > > > > > > in
> > > > > > > > > > > a
> > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > > > the interface between the
> > "Operator
> > > > > > > Resource
> > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > take a while to settle down if
> we
> > > > want to
> > > > > > > > make it
> > > > > > > > > > > > > > generic.
> > > > > > > > > > > > > > > > But I
> > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > > > > > implementation should take this
> > > > future
> > > > > > work
> > > > > > > > into
> > > > > > > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > > > > > > don't need to break backwards
> > > > > > compatibility
> > > > > > > > once
> > > > > > > > > > we
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM
> > > > Stephan
> > > > > > > Ewen
> > > > > > > > <
> > > > > > > > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thank you for writing this
> > FLIP.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I cannot really give much
> input
> > > > into
> > > > > > the
> > > > > > > > > > > mechanics
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > > > > > > and GPU allocation, as I have
> > no
> > > > > > > experience
> > > > > > > > > > with
> > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > One thought I had when
> reading
> > the
> > > > > > > > proposal is
> > > > > > > > > > if
> > > > > > > > > > > > it
> > > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > > > > > > the "GPU Manager" as an
> > "External
> > > > > > > Resource
> > > > > > > > > > > > Manager",
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > > > > > > The way I understand the
> > > > > > ResourceProfile
> > > > > > > > and
> > > > > > > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > > > > > > It has the advantage that it
> > looks
> > > > more
> > > > > > > > > > > extensible.
> > > > > > > > > > > > > > Maybe
> > > > > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > Resource, a specialized
> NVIDIA
> > GPU
> > > > > > > > Resource,
> > > > > > > > > > and
> > > > > > > > > > > > FPGA
> > > > > > > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57
> AM
> > > > Becket
> > > > > > > Qin <
> > > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze.
> > GPU
> > > > > > > resource
> > > > > > > > > > > > management
> > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > > > > > > for machine learning use
> > cases.
> > > > > > > Actually
> > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > > one
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > > > > > > question from the users who
> > are
> > > > > > > > interested in
> > > > > > > > > > > > using
> > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Some quick comments /
> > questions
> > > > to
> > > > > > the
> > > > > > > > wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API
> > should
> > > > > > probably
> > > > > > > > also
> > > > > > > > > > be
> > > > > > > > > > > > > > > > mentioned in
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > > > > > > 2. Is the data structure
> that
> > > > holds
> > > > > > GPU
> > > > > > > > info
> > > > > > > > > > > > also a
> > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at
> 10:15
> > AM
> > > > > > Xintong
> > > > > > > > Song
> > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for drafting the
> > FLIP
> > > > and
> > > > > > > > kicking
> > > > > > > > > > off
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Big +1 for this feature.
> > > > Supporting
> > > > > > > > using
> > > > > > > > > > of
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > > > > > > especially for the ML
> > > > scenarios.
> > > > > > > > > > > > > > > > > > > > > > > > > > I've reviewed the FLIP
> wiki
> > > > doc and
> > > > > > > it
> > > > > > > > > > looks
> > > > > > > > > > > > good
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > me. I
> > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > > > > > > very good first step for
> > > > Flink's
> > > > > > GPU
> > > > > > > > > > > supports.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at
> > 12:06 PM
> > > > > > > Yangze
> > > > > > > > Guo
> > > > > > > > > > <
> > > > > > > > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > We would like to start
> a
> > > > > > discussion
> > > > > > > > > > thread
> > > > > > > > > > > on
> > > > > > > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > This FLIP mainly
> > discusses
> > > > the
> > > > > > > > following
> > > > > > > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > - Enable user to
> > configure
> > > > how
> > > > > > many
> > > > > > > > GPUs
> > > > > > > > > > > in a
> > > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > > forward such
> > requirements to
> > > > the
> > > > > > > > external
> > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos
> > > > setups).
> > > > > > > > > > > > > > > > > > > > > > > > > > > - Provide information
> of
> > > > > > available
> > > > > > > > GPU
> > > > > > > > > > > > > resources
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Key changes proposed in
> > the
> > > > FLIP
> > > > > > > are
> > > > > > > > as
> > > > > > > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU resource
> > > > > > requirements
> > > > > > > > to
> > > > > > > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce GPUManager
> as
> > > > one of
> > > > > > > the
> > > > > > > > task
> > > > > > > > > > > > > manager
> > > > > > > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > > > > > > and expose GPU resource
> > > > > > information
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > > > context
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce the default
> > > > script
> > > > > > for
> > > > > > > > GPU
> > > > > > > > > > > > > discovery,
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > > > > > > the privilege mode to
> > help
> > > > user
> > > > > > to
> > > > > > > > > > achieve
> > > > > > > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Please find more
> details
> > in
> > > > the
> > > > > > > FLIP
> > > > > > > > wiki
> > > > > > > > > > > > > > document
> > > > > > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Till Rohrmann <tr...@apache.org>.
At the moment the RM does not have a user code class loader and I agree
with Stephan that it should stay like this. This, however, does not mean
that we cannot support pluggable components in the RM. As long as the
plugins are on the system's class path, it should be fine for the RM to
load them. For example, we could add external resources via Flink's plugin
mechanism or something similar.

A very simple implementation of such an ExternalResourceDriver could be a
class which simply returns what is written in the flink-conf.yaml under a
given key.

Cheers,
Till

On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo <ka...@gmail.com> wrote:

> Hi, Stephan,
>
> I see your concern and I totally agree with you.
>
> The interface on RM side is now `Map<String key, String/Long value>
> getYarn/KubernetesExternalResource()`. The only valid information RM
> get from it is the configuration key of that external resource in
> Yarn/K8s. The "String/Long value" would be the same as the
> external-resource.{resourceName}.amount.
> So, I think it makes sense to replace these two interfaces with two
> configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key. We
> may lose some extensibility, but AFAIK it could work with common
> external resources like GPU, FPGA. WDYT?
>
> Best,
> Yangze Guo
>
> On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen <se...@apache.org> wrote:
> >
> > Maybe one final comment: It is probably not an issue, but let's try and
> > keep user code (via user code classloader) out of the ResourceManager, if
> > possible.
> >
> > As background:
> >
> > There were thoughts in the past to support setups where the RM must run
> > with "superuser" credentials, but we cannot run JM/TM with these
> > credentials, as the user code might access them otherwise.
> > This is actually possible today, you can run the RM in a different JVM or
> > in a different container, and give it more credentials than JMs / TMs.
> But
> > for this to be feasible, we cannot allow any user-defined code to be in
> the
> > JVM, because that instantaneously breaks the isolation of credentials.
> >
> >
> >
> > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Thanks for the feedback, @Till and @Xintong.
> > >
> > > Regarding separating the interface, I'm also +1 with it.
> > >
> > > Regarding the resource allocation interface, true, it's dangerous to
> > > give much access to user codes. Changing the return type to Map<String
> > > key, String/Long value> makes sense to me. AFAIK, it is compatible
> > > with all the first-party supported resources for Yarn/Kubernetes. It
> > > could also free us from the potential dependency issue as well.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > > >
> > > > Thanks for updating the FLIP, Yangze.
> > > >
> > > > I agree with Till that we probably want to separate the K8s/Yarn
> > > decorator
> > > > calls. Users can still configure one driver class, and we can use
> > > > `instanceof` to check whether the driver implemented K8s/Yarn
> specific
> > > > interfaces.
> > > >
> > > > Moreover, I'm not sure about exposing entire `ContainerRequest` /
> `Pod`
> > > > (`AbstractKubernetesStepDecorator` directly manipulates on `Pod`) to
> user
> > > > codes. It gives more access to user codes than needed for defining
> > > external
> > > > resource, which might cause problems. Instead, I would suggest to
> have
> > > > interface like `Map<String key, String value>
> > > > getYarn/KubernetesExternalResource()` and assemble them into
> > > > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <tr...@apache.org>
> > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I'm a bit late to the party. I think the current proposal looks
> good.
> > > > >
> > > > > Concerning the ExternalResourceDriver interface defined in the FLIP
> > > [1], I
> > > > > would suggest to not include the decorator calls for Kubernetes and
> > > Yarn in
> > > > > the base interface. Instead I would suggest to segregate the
> deployment
> > > > > specific decorator calls into separate interfaces. That way an
> > > > > ExternalResourceDriver does not have to support all deployments
> from
> > > the
> > > > > very beginning. Moreover, some resources might not be supported by
> a
> > > > > specific deployment target and the natural way to express this
> would
> > > be to
> > > > > not implement the respective deployment specific interface.
> > > > >
> > > > > Moreover, having void
> > > > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> > > containerRequest)
> > > > > in the ExternalResourceDriver interface would require Hadoop on
> Flink's
> > > > > classpath whenever the external resource driver is being used.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > >
> > > > > > Nice, thanks a lot!
> > > > > >
> > > > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <ka...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Thanks for the suggestion, @Stephan, @Becket and @Xintong.
> > > > > > >
> > > > > > > I've updated the FLIP accordingly. I do not add a
> > > > > > > ResourceInfoProvider. Instead, I introduce the
> > > ExternalResourceDriver,
> > > > > > > which takes the responsibility of all relevant operations on
> both
> > > RM
> > > > > > > and TM sides.
> > > > > > > After a rethink about decoupling the management of external
> > > resources
> > > > > > > from TaskExecutor, I think we could do the same thing on the
> > > > > > > ResourceManager side. We do not need to add a specific
> allocation
> > > > > > > logic to the ResourceManager each time we add a specific
> external
> > > > > > > resource.
> > > > > > > - For Yarn, we need the ExternalResourceDriver to edit the
> > > > > > > containerRequest.
> > > > > > > - For Kubenetes, ExternalResourceDriver could provide a
> decorator
> > > for
> > > > > > > the TM pod.
> > > > > > >
> > > > > > > In this way, just like MetricReporter, we allow users to define
> > > their
> > > > > > > custom ExternalResourceDriver. It is more extensible and fits
> the
> > > > > > > separation of concerns. For more details, please take a look at
> > > [1].
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <sewen@apache.org
> >
> > > wrote:
> > > > > > > >
> > > > > > > > This sounds good to go ahead from my side.
> > > > > > > >
> > > > > > > > I like the approach that Becket suggested - in that case the
> core
> > > > > > > > abstraction that everyone would need to understand would be
> > > "external
> > > > > > > > resource allocation" and the "ResourceInfoProvider", and the
> GPU
> > > > > > specific
> > > > > > > > code would be a specific implementation only known to that
> > > component
> > > > > > that
> > > > > > > > allocates the external resource. That fits the separation of
> > > concerns
> > > > > > > well.
> > > > > > > >
> > > > > > > > I also understand that it should not be over-engineered in
> the
> > > first
> > > > > > > > version, so some simplification makes sense, and then
> gradually
> > > > > expand
> > > > > > > from
> > > > > > > > there.
> > > > > > > >
> > > > > > > > So +1 to go ahead with what was suggested above (Xintong /
> > > Becket)
> > > > > from
> > > > > > > my
> > > > > > > > side.
> > > > > > > >
> > > > > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <
> > > tonysong820@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the comments, Stephan & Becket.
> > > > > > > > >
> > > > > > > > > @Stephan
> > > > > > > > >
> > > > > > > > > I see your concern, and I completely agree with you that we
> > > should
> > > > > > > first
> > > > > > > > > think about the "library" / "plugin" / "extension" style if
> > > > > possible.
> > > > > > > > >
> > > > > > > > > If GPUs are sliced and assigned during scheduling, there
> may be
> > > > > > reason,
> > > > > > > > > > although it looks that it would belong to the slot then.
> Is
> > > that
> > > > > > > what we
> > > > > > > > > > are doing here?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > In the current proposal, we do not have the GPUs sliced and
> > > > > assigned
> > > > > > to
> > > > > > > > > slots, because it could be problematic without dynamic slot
> > > > > > allocation.
> > > > > > > > > E.g., the number of GPUs might not be evenly divisible by
> the
> > > > > number
> > > > > > of
> > > > > > > > > slots.
> > > > > > > > >
> > > > > > > > > I think it makes sense to eventually have the GPUs
> assigned to
> > > > > slots.
> > > > > > > Even
> > > > > > > > > then, we might still need a TM level GPUManager (or
> > > > > ResourceProvider
> > > > > > > like
> > > > > > > > > Becket suggested). For memory, in each slot we can simply
> > > request
> > > > > the
> > > > > > > > > amount of memory, leaving it to JVM / OS to decide which
> memory
> > > > > > > (address)
> > > > > > > > > should be assigned. For GPU, and potentially other
> resources
> > > like
> > > > > > > FPGA, we
> > > > > > > > > need to explicitly specify which GPU (index) should be
> used.
> > > > > > > Therefore, we
> > > > > > > > > need some component at the TM level to coordinate which
> slot
> > > uses
> > > > > > which
> > > > > > > > > GPU.
> > > > > > > > >
> > > > > > > > > IMO, unless we say Flink will not support slot-level GPU
> > > slicing at
> > > > > > > least
> > > > > > > > > in the foreseeable future, I don't see a good way to avoid
> > > touching
> > > > > > > the TM
> > > > > > > > > core. To that end, I think Becket's suggestion points to a
> good
> > > > > > > direction,
> > > > > > > > > that supports more features (GPU, FPGA, etc.) with less
> > > coupling to
> > > > > > > the TM
> > > > > > > > > core (only needs to understand the general interfaces). The
> > > > > detailed
> > > > > > > > > implementation for specific resource types can even be
> > > encapsulated
> > > > > > as
> > > > > > > a
> > > > > > > > > library.
> > > > > > > > >
> > > > > > > > > @Becket
> > > > > > > > >
> > > > > > > > > Thanks for sharing your thought on the final state.
> Despite the
> > > > > > > details how
> > > > > > > > > the interfaces should look like, I think this is a really
> good
> > > > > > > abstraction
> > > > > > > > > for supporting general resource types.
> > > > > > > > >
> > > > > > > > > I'd like to further clarify that, the following three
> things
> > > are
> > > > > all
> > > > > > > that
> > > > > > > > > the "Flink core" needs to understand.
> > > > > > > > >
> > > > > > > > >    - The *amount* of resource, for scheduling. Actually, we
> > > already
> > > > > > > have
> > > > > > > > >    the Resource class in ResourceProfile and ResourceSpec
> for
> > > > > > extended
> > > > > > > > >    resource. It's just not really used.
> > > > > > > > >    - The *info*, that Flink provides to the operators /
> user
> > > codes.
> > > > > > > > >    - The *provider*, which generates the info based on the
> > > amount.
> > > > > > > > >
> > > > > > > > > The "core" does not need to understand the specific
> > > implementation
> > > > > > > details
> > > > > > > > > of the above three. They can even be implemented in a
> 3rd-party
> > > > > > > library.
> > > > > > > > > Similar to how we allow users to define their custom
> > > > > MetricReporter.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <
> > > becket.qin@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the comment, Stephan.
> > > > > > > > > >
> > > > > > > > > >   - If everything becomes a "core feature", it will make
> the
> > > > > > project
> > > > > > > hard
> > > > > > > > > > > to develop in the future. Thinking "library" /
> "plugin" /
> > > > > > > "extension"
> > > > > > > > > > style
> > > > > > > > > > > where possible helps.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Completely agree. It is much more important to design a
> > > mechanism
> > > > > > > than
> > > > > > > > > > focusing on a specific case. Here is what I am thinking
> to
> > > fully
> > > > > > > support
> > > > > > > > > > custom resource management:
> > > > > > > > > > 1. On the JM / RM side, use ResourceProfile and
> ResourceSpec
> > > to
> > > > > > > define
> > > > > > > > > the
> > > > > > > > > > resource and the amount required. They will be used to
> find
> > > > > > suitable
> > > > > > > TMs
> > > > > > > > > > slots to run the tasks. At this point, the resources are
> only
> > > > > > > measured by
> > > > > > > > > > amount, i.e. they do not have individual ID.
> > > > > > > > > >
> > > > > > > > > > 2. On the TM side, have something like
> > > *"ResourceInfoProvider"*
> > > > > to
> > > > > > > > > identify
> > > > > > > > > > and provides the detail information of the individual
> > > resource,
> > > > > > e.g.
> > > > > > > GPU
> > > > > > > > > > ID.. It is important because the operator may have to
> > > explicitly
> > > > > > > interact
> > > > > > > > > > with the physical resource it uses. The
> ResourceInfoProvider
> > > > > might
> > > > > > > look
> > > > > > > > > > like something below.
> > > > > > > > > > interface ResourceInfoProvider<INFO> {
> > > > > > > > > >     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId
> > > opId,
> > > > > > > > > > ResourceProfile resourceProfile);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > - There could be several "*ResourceInfoProvider*"
> configured
> > > on
> > > > > the
> > > > > > > TM to
> > > > > > > > > > retrieve the information for different resources.
> > > > > > > > > > - The TM will be responsible to assign those individual
> > > resources
> > > > > > to
> > > > > > > each
> > > > > > > > > > operator according to their requested amount.
> > > > > > > > > > - The operators will be able to get the ResourceInfo from
> > > their
> > > > > > > > > > RuntimeContext.
> > > > > > > > > >
> > > > > > > > > > If we agree this is a reasonable final state. We can
> adapt
> > > the
> > > > > > > current
> > > > > > > > > FLIP
> > > > > > > > > > to it. In fact it does not sound a big change to me. All
> the
> > > > > > proposed
> > > > > > > > > > configuration can be as is, it is just that Flink itself
> > > won't
> > > > > care
> > > > > > > about
> > > > > > > > > > them, instead a GPUInfoProviver implementing the
> > > > > > ResourceInfoProvider
> > > > > > > > > will
> > > > > > > > > > use them.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <
> > > sewen@apache.org>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi all!
> > > > > > > > > > >
> > > > > > > > > > > The main point I wanted to throw into the discussion
> is the
> > > > > > > following:
> > > > > > > > > > >   - With more and more use cases, more and more tools
> go
> > > into
> > > > > > Flink
> > > > > > > > > > >   - If everything becomes a "core feature", it will
> make
> > > the
> > > > > > > project
> > > > > > > > > hard
> > > > > > > > > > > to develop in the future. Thinking "library" /
> "plugin" /
> > > > > > > "extension"
> > > > > > > > > > style
> > > > > > > > > > > where possible helps.
> > > > > > > > > > >
> > > > > > > > > > >   - A good thought experiment is always: How many
> future
> > > > > > developers
> > > > > > > > > have
> > > > > > > > > > to
> > > > > > > > > > > interact with this code (and possibly understand it
> > > partially),
> > > > > > > even if
> > > > > > > > > > the
> > > > > > > > > > > features they touch have nothing to do with GPU
> support. If
> > > > > many
> > > > > > > > > > > contributors to unrelated features will have to touch
> it
> > > and
> > > > > > > understand
> > > > > > > > > > it,
> > > > > > > > > > > then let's think if there is a different solution.
> Maybe
> > > there
> > > > > is
> > > > > > > not,
> > > > > > > > > > but
> > > > > > > > > > > then we should be sure why.
> > > > > > > > > > >
> > > > > > > > > > >   - That led me to raising this issue: If the GPU
> manager
> > > > > > becomes a
> > > > > > > > > core
> > > > > > > > > > > service in the TaskManager, Environment,
> RuntimeContext,
> > > etc.
> > > > > > then
> > > > > > > > > > everyone
> > > > > > > > > > > developing TM and streaming tasks need to understand
> the
> > > GPU
> > > > > > > manager.
> > > > > > > > > > That
> > > > > > > > > > > seems oddly specific, is my impression.
> > > > > > > > > > >
> > > > > > > > > > > Access to configuration seems not the right reason to
> do
> > > that.
> > > > > We
> > > > > > > > > should
> > > > > > > > > > > expose the Flink configuration from the RuntimeContext
> > > anyways.
> > > > > > > > > > >
> > > > > > > > > > > If GPUs are sliced and assigned during scheduling,
> there
> > > may be
> > > > > > > reason,
> > > > > > > > > > > although it looks that it would belong to the slot
> then. Is
> > > > > that
> > > > > > > what
> > > > > > > > > we
> > > > > > > > > > > are doing here?
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Stephan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > > > > > >
> > > > > > > > > > > > IMO, eventually an operator should only see info of
> GPUs
> > > that
> > > > > > are
> > > > > > > > > > > dedicated
> > > > > > > > > > > > for it, instead of all GPUs on the machine/container
> in
> > > the
> > > > > > > current
> > > > > > > > > > > design.
> > > > > > > > > > > > It does not make sense to let the user who writes a
> UDF
> > > to
> > > > > > worry
> > > > > > > > > about
> > > > > > > > > > > > coordination among multiple operators running on the
> same
> > > > > > > machine.
> > > > > > > > > And
> > > > > > > > > > if
> > > > > > > > > > > > we want to limit the GPU info an operator sees, we
> > > should not
> > > > > > > let the
> > > > > > > > > > > > operator to instantiate GPUManager, which means we
> have
> > > to
> > > > > > expose
> > > > > > > > > > > something
> > > > > > > > > > > > through runtime context, either GPU info or some
> kind of
> > > > > > limited
> > > > > > > > > access
> > > > > > > > > > > to
> > > > > > > > > > > > the GPUManager.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <
> > > > > > becket.qin@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > It probably make sense for us to first agree on the
> > > final
> > > > > > > state.
> > > > > > > > > More
> > > > > > > > > > > > > specifically, will the resource info be exposed
> through
> > > > > > runtime
> > > > > > > > > > context
> > > > > > > > > > > > > eventually?
> > > > > > > > > > > > >
> > > > > > > > > > > > > If that is the final state and we have a seamless
> > > migration
> > > > > > > story
> > > > > > > > > > from
> > > > > > > > > > > > this
> > > > > > > > > > > > > FLIP to that final state, Personally I think it is
> OK
> > > to
> > > > > > > expose the
> > > > > > > > > > GPU
> > > > > > > > > > > > > info in the runtime context.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> > > > > > > > > tonysong820@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > @Yangze,
> > > > > > > > > > > > > > I think what Stephan means (@Stephan, please
> correct
> > > me
> > > > > if
> > > > > > > I'm
> > > > > > > > > > wrong)
> > > > > > > > > > > > is
> > > > > > > > > > > > > > that, we might not need to hold and maintain the
> > > > > GPUManager
> > > > > > > as a
> > > > > > > > > > > > service
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > TaskManagerServices or RuntimeContext. An
> > > alternative is
> > > > > to
> > > > > > > > > create
> > > > > > > > > > /
> > > > > > > > > > > > > > retrieve the GPUManager only in the operators
> that
> > > need
> > > > > it,
> > > > > > > e.g.,
> > > > > > > > > > > with
> > > > > > > > > > > > a
> > > > > > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Stephan,
> > > > > > > > > > > > > > I agree with you on excluding GPUManager from
> > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    - For the first step, where we provide unified
> > > > > TM-level
> > > > > > > GPU
> > > > > > > > > > > > > information
> > > > > > > > > > > > > >    to all operators, it should be fine to have
> > > operators
> > > > > > > access /
> > > > > > > > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > > > > > > > >    - In future, we might have some more
> fine-grained
> > > GPU
> > > > > > > > > > management,
> > > > > > > > > > > > > where
> > > > > > > > > > > > > >    we need to maintain GPUManager as a service
> and
> > > put
> > > > > GPU
> > > > > > > info
> > > > > > > > > in
> > > > > > > > > > > slot
> > > > > > > > > > > > > >    profiles. But at least for now it's not
> necessary
> > > to
> > > > > > > introduce
> > > > > > > > > > > such
> > > > > > > > > > > > > >    complexity.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > However, I have some concerns on excluding
> GPUManager
> > > > > from
> > > > > > > > > > > > RuntimeContext
> > > > > > > > > > > > > > and let operators access it directly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    - Configurations needed for creating the
> > > GPUManager is
> > > > > > not
> > > > > > > > > > always
> > > > > > > > > > > > > >    available for operators.
> > > > > > > > > > > > > >    - If later we want to have fine-grained
> control
> > > over
> > > > > GPU
> > > > > > > > > (e.g.,
> > > > > > > > > > > > > >    operators in each slot can only see GPUs
> reserved
> > > for
> > > > > > that
> > > > > > > > > > slot),
> > > > > > > > > > > > the
> > > > > > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I would suggest to wrap the GPUManager behind
> > > > > > RuntimeContext
> > > > > > > and
> > > > > > > > > > only
> > > > > > > > > > > > > > expose the GPUInfo to users. For now, we can
> declare
> > > a
> > > > > > method
> > > > > > > > > > > > > > `getGPUInfo()` in RuntimeContext, with a default
> > > > > definition
> > > > > > > that
> > > > > > > > > > > calls
> > > > > > > > > > > > > > `GPUManager.get()` to get the lazily-created
> > > GPUManager.
> > > > > If
> > > > > > > later
> > > > > > > > > > we
> > > > > > > > > > > > want
> > > > > > > > > > > > > > to create / retrieve GPUManager in a different
> way,
> > > we
> > > > > can
> > > > > > > simply
> > > > > > > > > > > > change
> > > > > > > > > > > > > > how `getGPUInfo` is implemented, without needing
> to
> > > > > change
> > > > > > > any
> > > > > > > > > > public
> > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <
> > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Shephan
> > > > > > > > > > > > > > > Do you mean Minicluster? Yes, it makes sense to
> > > share
> > > > > the
> > > > > > > GPU
> > > > > > > > > > > Manager
> > > > > > > > > > > > > > > in such scenario.
> > > > > > > > > > > > > > > If that's what you worry about, I'm +1 for
> holding
> > > > > > > > > > > > > > > GPUManager(ExternalResourceManagers) in
> > > TaskExecutor
> > > > > > > instead of
> > > > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regarding the RuntimeContext/FunctionContext,
> it
> > > just
> > > > > > > holds the
> > > > > > > > > > GPU
> > > > > > > > > > > > > > > info instead of the GPU Manager. AFAIK, it's
> the
> > > only
> > > > > > > place we
> > > > > > > > > > > could
> > > > > > > > > > > > > > > pass GPU info to the
> > > RichFunction/UserDefinedFunction.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried
> <
> > > > > > > > > > > isaac@paddlesoft.net
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000
> > > > > > sewen@apache.org
> > > > > > > > > wrote
> > > > > > > > > > > > ----
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can we somehow keep this out of the
> > > TaskManager
> > > > > > > services
> > > > > > > > > > > > > > > > > I fear that we could not. IMO, the
> > > GPUManager(or
> > > > > > > > > > > > > > > > > ExternalServicesManagers in future) is
> > > conceptually
> > > > > > > one of
> > > > > > > > > > the
> > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > manager services, just like MemoryManager
> > > before
> > > > > > 1.10.
> > > > > > > > > > > > > > > > > - It maintains/holds the GPU resource at TM
> > > level
> > > > > and
> > > > > > > all
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > operators allocate the GPU resources from
> it.
> > > So,
> > > > > it
> > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > > > > > > > - We could add a collection called
> > > > > > > ExternalResourceManagers
> > > > > > > > > > to
> > > > > > > > > > > > hold
> > > > > > > > > > > > > > > > > all managers of other external resources
> in the
> > > > > > future.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you help me understand why this needs the
> > > > > addition
> > > > > > in
> > > > > > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > > > > > Are you worried about the case when multiple
> Task
> > > > > > > Executors
> > > > > > > > > run
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > JVM? That's not common, but wouldn't it
> actually
> > > be
> > > > > > good
> > > > > > > in
> > > > > > > > > > that
> > > > > > > > > > > > case
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > share the GPU Manager, given that the GPU is
> > > shared?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ---------------------------
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > What parts need information about this?
> > > > > > > > > > > > > > > > > In this FLIP, operators need the
> information.
> > > Thus,
> > > > > > we
> > > > > > > > > expose
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > information to the
> > > RuntimeContext/FunctionContext.
> > > > > > The
> > > > > > > slot
> > > > > > > > > > > > profile
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > not aware of GPU resources as GPU is TM
> level
> > > > > > resource
> > > > > > > now.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can the GPU Manager be a "self contained"
> > > thing
> > > > > > that
> > > > > > > > > simply
> > > > > > > > > > > > takes
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > configuration, and then abstracts
> everything
> > > > > > > internally?
> > > > > > > > > > > > > > > > > Yes, we just pass the path/args of the
> discover
> > > > > > script
> > > > > > > and
> > > > > > > > > > how
> > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > GPUs per TM to it. It takes the
> responsibility
> > > to
> > > > > get
> > > > > > > the
> > > > > > > > > GPU
> > > > > > > > > > > > > > > > > information and expose them to the
> > > > > > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > Operators. Meanwhile, we'd better not allow
> > > > > operators
> > > > > > > to
> > > > > > > > > > > directly
> > > > > > > > > > > > > > > > > access GPUManager, it should get what they
> want
> > > > > from
> > > > > > > > > Context.
> > > > > > > > > > > We
> > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > then decouple the interface/implementation
> of
> > > > > > > GPUManager
> > > > > > > > > and
> > > > > > > > > > > > Public
> > > > > > > > > > > > > > > > > API.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan
> Ewen <
> > > > > > > > > > sewen@apache.org
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > It sounds fine to initially start with
> GPU
> > > > > specific
> > > > > > > > > support
> > > > > > > > > > > and
> > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > generalizing this once we better
> understand
> > > the
> > > > > > > space.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > About the implementation suggested in
> > > FLIP-108:
> > > > > > > > > > > > > > > > > > - Can we somehow keep this out of the
> > > TaskManager
> > > > > > > > > services?
> > > > > > > > > > > > > > Anything
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > have to pull through all layers of the TM
> > > makes
> > > > > the
> > > > > > > TM
> > > > > > > > > > > > components
> > > > > > > > > > > > > > yet
> > > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - What parts need information about this?
> > > > > > > > > > > > > > > > > > -> do the slot profiles need information
> > > about
> > > > > the
> > > > > > > GPU?
> > > > > > > > > > > > > > > > > > -> Can the GPU Manager be a "self
> contained"
> > > > > thing
> > > > > > > that
> > > > > > > > > > > simply
> > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > the configuration, and then abstracts
> > > everything
> > > > > > > > > > internally?
> > > > > > > > > > > > > > > Operators
> > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze
> Guo <
> > > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > > > > Regarding the WebUI and GPUInfo, you're
> > > right,
> > > > > > > I'll add
> > > > > > > > > > > them
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > > > > > Regarding the general extended resource
> > > > > > mechanism,
> > > > > > > I
> > > > > > > > > > second
> > > > > > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > > > > > - It's better to leverage
> ResourceProfile
> > > and
> > > > > > > > > > ResourceSpec
> > > > > > > > > > > > > after
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > supporting fine-grained GPU
> scheduling. As
> > > a
> > > > > > first
> > > > > > > step
> > > > > > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > > > > > prefer to not include it in the scope
> of
> > > this
> > > > > > FLIP.
> > > > > > > > > > > > > > > > > > > - Regarding the "Extended Resource
> > > Manager",
> > > > > if I
> > > > > > > > > > > understand
> > > > > > > > > > > > > > > > > > > correctly, it just a code refactoring
> atm,
> > > we
> > > > > > could
> > > > > > > > > > extract
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > open/close/allocateExtendResources of
> > > > > GPUManager
> > > > > > to
> > > > > > > > > that
> > > > > > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > > > > > that is the case, +1 to do it during
> > > > > > > implementation.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > > > > > As Xintong said, we looked into how
> Spark
> > > > > > supports
> > > > > > > a
> > > > > > > > > > > general
> > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > Resource Scheduling" before and
> decided to
> > > > > > > introduce a
> > > > > > > > > > > common
> > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > >
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > > > > > to make it more extensible. I think the
> > > > > > "resource"
> > > > > > > is a
> > > > > > > > > > > > proper
> > > > > > > > > > > > > > > level
> > > > > > > > > > > > > > > > > > > to contain all the configs of extended
> > > > > resources.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo
> > > Huang <
> > > > > > > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > There is no doubt that GPU resource
> > > > > management
> > > > > > > > > support
> > > > > > > > > > > will
> > > > > > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > > > > > facilitate the development of
> AI-related
> > > > > > > applications
> > > > > > > > > > by
> > > > > > > > > > > > > > PyFlink
> > > > > > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I have only one comment about this
> wiki:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Regarding the names of several GPU
> > > > > > > configurations, I
> > > > > > > > > > > think
> > > > > > > > > > > > it
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > delete the resource field makes it
> > > consistent
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > > > names
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > resource-related configurations in
> > > > > > > TaskManagerOption.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > taskmanager.resource.gpu.discovery-script.path
> > > > > > > > > ->
> > > > > > > > > > > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xintong Song <to...@gmail.com>
> > > > > > > 于2020年3月4日周三
> > > > > > > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Actually, Yangze, Yang and I also
> had
> > > an
> > > > > > > offline
> > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > > > > > the "GPU Support" as some general
> > > "Extended
> > > > > > > > > Resource
> > > > > > > > > > > > > > Support".
> > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > > > > > supporting extended resources in a
> > > general
> > > > > > > > > mechanism
> > > > > > > > > > is
> > > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > and extensible way. The reason we
> > > propose
> > > > > > this
> > > > > > > FLIP
> > > > > > > > > > > > > narrowing
> > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > > > > down to GPU alone, is mainly for
> the
> > > > > concern
> > > > > > on
> > > > > > > > > extra
> > > > > > > > > > > > > efforts
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > > > > > capacity needed for a general
> > > mechanism.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > To come up with a well design on a
> > > general
> > > > > > > extended
> > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > mechanism, we would need to
> investigate
> > > > > more
> > > > > > > on how
> > > > > > > > > > > > people
> > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > > kind of resources in practice. For
> > > GPU, we
> > > > > > > learnt
> > > > > > > > > > such
> > > > > > > > > > > > > > > knowledge
> > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > experts, Becket and his team
> members.
> > > But
> > > > > for
> > > > > > > FPGA,
> > > > > > > > > > or
> > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > > > > extended resources, we don't have
> such
> > > > > > > convenient
> > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > > > > > making the investigation requires
> more
> > > > > > efforts,
> > > > > > > > > > which I
> > > > > > > > > > > > > tend
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On the other hand, we also looked
> into
> > > how
> > > > > > > Spark
> > > > > > > > > > > > supports a
> > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > Resource Scheduling". Assuming we
> want
> > > to
> > > > > > have
> > > > > > > a
> > > > > > > > > > > similar
> > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > resource mechanism in the future,
> we
> > > > > believe
> > > > > > > that
> > > > > > > > > the
> > > > > > > > > > > > > current
> > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > > design can be easily extended, in
> an
> > > > > > > incremental
> > > > > > > > > way
> > > > > > > > > > > > > without
> > > > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > - The most important part is
> probably
> > > user
> > > > > > > > > > interfaces.
> > > > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > > > > > configuration options to define the
> > > amount,
> > > > > > > > > discovery
> > > > > > > > > > > > > script
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > > > > > k8s) in a per resource type bias
> [1],
> > > which
> > > > > > is
> > > > > > > very
> > > > > > > > > > > > similar
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > proposed in this FLIP. I think
> it's not
> > > > > > > necessary
> > > > > > > > > to
> > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > in the general way atm, since we
> do not
> > > > > have
> > > > > > > > > supports
> > > > > > > > > > > for
> > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > types now. If later we decided to
> have
> > > per
> > > > > > > resource
> > > > > > > > > > > type
> > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > > > > > can have backwards compatibility
> on the
> > > > > > current
> > > > > > > > > > > proposed
> > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > > > > > - For the GPU Manager, if later
> needed
> > > we
> > > > > can
> > > > > > > > > change
> > > > > > > > > > it
> > > > > > > > > > > > to
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > > > > > Resource Manager" (or whatever it
> is
> > > > > called).
> > > > > > > That
> > > > > > > > > > > should
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > > > > > > > > > - For ResourceProfile and
> ResourceSpec,
> > > > > there
> > > > > > > are
> > > > > > > > > > > already
> > > > > > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > > > > > general extended resource. We can
> of
> > > course
> > > > > > > > > leverage
> > > > > > > > > > > them
> > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > > > > > fine grained GPU scheduling. That
> is
> > > also
> > > > > not
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > scope
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > > > > step proposal, and would require
> > > FLIP-56 to
> > > > > > be
> > > > > > > > > > finished
> > > > > > > > > > > > > > first.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > To summary up, I agree with Becket
> that
> > > > > have
> > > > > > a
> > > > > > > > > > separate
> > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > general extended resource
> mechanism,
> > > and
> > > > > keep
> > > > > > > it in
> > > > > > > > > > > mind
> > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM
> Becket
> > > Qin <
> > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > That's a good point, Stephan. It
> > > makes
> > > > > > total
> > > > > > > > > sense
> > > > > > > > > > to
> > > > > > > > > > > > > > > generalize
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > resource management to support
> custom
> > > > > > > resources.
> > > > > > > > > > > Having
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > > > to add new resources by
> themselves.
> > > The
> > > > > > > general
> > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 1. The custom resource type
> > > definition.
> > > > > It
> > > > > > is
> > > > > > > > > > > supported
> > > > > > > > > > > > > by
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > resources in ResourceProfile and
> > > > > > > ResourceSpec.
> > > > > > > > > This
> > > > > > > > > > > > will
> > > > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 2. The custom resource allocation
> > > logic,
> > > > > > > i.e. how
> > > > > > > > > > to
> > > > > > > > > > > > > assign
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > > > > to different tasks, operators,
> and
> > > so on.
> > > > > > > This
> > > > > > > > > may
> > > > > > > > > > > > > require
> > > > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > > > > > a. Subtask level - make sure the
> > > subtasks
> > > > > > > are put
> > > > > > > > > > > into
> > > > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > > > > > It is done by the global RM and
> is
> > > not
> > > > > > > > > customizable
> > > > > > > > > > > > right
> > > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > > b. Operator level - map the exact
> > > > > resource
> > > > > > > to the
> > > > > > > > > > > > > operators
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for
> > > operator
> > > > > B.
> > > > > > > This
> > > > > > > > > > step
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > needed
> > > > > > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > > > > > the global RM does not
> distinguish
> > > > > > individual
> > > > > > > > > > > resources
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > > > > > It is true for memory, but not
> for
> > > GPU.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > The GPU manager is designed to
> do 2.b
> > > > > here.
> > > > > > > So it
> > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > > > > > physical GPU information and
> > > bind/match
> > > > > > them
> > > > > > > to
> > > > > > > > > > each
> > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > general will fill in the missing
> > > piece to
> > > > > > > support
> > > > > > > > > > > > custom
> > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > definition. But I'd avoid
> calling it
> > > a
> > > > > > > "External
> > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > > > > confusion with RM, maybe
> something
> > > like
> > > > > > > "Operator
> > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > be more accurate. So for each
> > > resource
> > > > > type
> > > > > > > users
> > > > > > > > > > can
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > > > > > "Operator Resource Assigner" in
> the
> > > TM.
> > > > > For
> > > > > > > > > memory,
> > > > > > > > > > > > users
> > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > > > > > but for other extended resources,
> > > users
> > > > > may
> > > > > > > need
> > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Personally I think a pluggable
> > > "Operator
> > > > > > > Resource
> > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > > > > > in this FLIP. But I am also OK
> with
> > > > > having
> > > > > > > that
> > > > > > > > > in
> > > > > > > > > > a
> > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > > the interface between the
> "Operator
> > > > > > Resource
> > > > > > > > > > > Assigner"
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > take a while to settle down if we
> > > want to
> > > > > > > make it
> > > > > > > > > > > > > generic.
> > > > > > > > > > > > > > > But I
> > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > > > > implementation should take this
> > > future
> > > > > work
> > > > > > > into
> > > > > > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > > > > > don't need to break backwards
> > > > > compatibility
> > > > > > > once
> > > > > > > > > we
> > > > > > > > > > > > have
> > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM
> > > Stephan
> > > > > > Ewen
> > > > > > > <
> > > > > > > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thank you for writing this
> FLIP.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I cannot really give much input
> > > into
> > > > > the
> > > > > > > > > > mechanics
> > > > > > > > > > > of
> > > > > > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > > > > > and GPU allocation, as I have
> no
> > > > > > experience
> > > > > > > > > with
> > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > One thought I had when reading
> the
> > > > > > > proposal is
> > > > > > > > > if
> > > > > > > > > > > it
> > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > > > > > the "GPU Manager" as an
> "External
> > > > > > Resource
> > > > > > > > > > > Manager",
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > > > > > The way I understand the
> > > > > ResourceProfile
> > > > > > > and
> > > > > > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > > > > > It has the advantage that it
> looks
> > > more
> > > > > > > > > > extensible.
> > > > > > > > > > > > > Maybe
> > > > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > Resource, a specialized NVIDIA
> GPU
> > > > > > > Resource,
> > > > > > > > > and
> > > > > > > > > > > FPGA
> > > > > > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM
> > > Becket
> > > > > > Qin <
> > > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze.
> GPU
> > > > > > resource
> > > > > > > > > > > management
> > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > > > > > for machine learning use
> cases.
> > > > > > Actually
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > > > one
> > > > > > > > > > > > of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > > > > > question from the users who
> are
> > > > > > > interested in
> > > > > > > > > > > using
> > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Some quick comments /
> questions
> > > to
> > > > > the
> > > > > > > wiki.
> > > > > > > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API
> should
> > > > > probably
> > > > > > > also
> > > > > > > > > be
> > > > > > > > > > > > > > > mentioned in
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > > > > > 2. Is the data structure that
> > > holds
> > > > > GPU
> > > > > > > info
> > > > > > > > > > > also a
> > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15
> AM
> > > > > Xintong
> > > > > > > Song
> > > > > > > > > <
> > > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks for drafting the
> FLIP
> > > and
> > > > > > > kicking
> > > > > > > > > off
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Big +1 for this feature.
> > > Supporting
> > > > > > > using
> > > > > > > > > of
> > > > > > > > > > > GPU
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > > > > > especially for the ML
> > > scenarios.
> > > > > > > > > > > > > > > > > > > > > > > > > I've reviewed the FLIP wiki
> > > doc and
> > > > > > it
> > > > > > > > > looks
> > > > > > > > > > > good
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > me. I
> > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > > > > > very good first step for
> > > Flink's
> > > > > GPU
> > > > > > > > > > supports.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at
> 12:06 PM
> > > > > > Yangze
> > > > > > > Guo
> > > > > > > > > <
> > > > > > > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > We would like to start a
> > > > > discussion
> > > > > > > > > thread
> > > > > > > > > > on
> > > > > > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > This FLIP mainly
> discusses
> > > the
> > > > > > > following
> > > > > > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > - Enable user to
> configure
> > > how
> > > > > many
> > > > > > > GPUs
> > > > > > > > > > in a
> > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > forward such
> requirements to
> > > the
> > > > > > > external
> > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos
> > > setups).
> > > > > > > > > > > > > > > > > > > > > > > > > > - Provide information of
> > > > > available
> > > > > > > GPU
> > > > > > > > > > > > resources
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Key changes proposed in
> the
> > > FLIP
> > > > > > are
> > > > > > > as
> > > > > > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU resource
> > > > > requirements
> > > > > > > to
> > > > > > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce GPUManager as
> > > one of
> > > > > > the
> > > > > > > task
> > > > > > > > > > > > manager
> > > > > > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > > > > > and expose GPU resource
> > > > > information
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > > context
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce the default
> > > script
> > > > > for
> > > > > > > GPU
> > > > > > > > > > > > discovery,
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > > > > > the privilege mode to
> help
> > > user
> > > > > to
> > > > > > > > > achieve
> > > > > > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Please find more details
> in
> > > the
> > > > > > FLIP
> > > > > > > wiki
> > > > > > > > > > > > > document
> > > > > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Yangze Guo <ka...@gmail.com>.
Hi, Stephan,

I see your concern and I totally agree with you.

The interface on RM side is now `Map<String key, String/Long value>
getYarn/KubernetesExternalResource()`. The only valid information RM
get from it is the configuration key of that external resource in
Yarn/K8s. The "String/Long value" would be the same as the
external-resource.{resourceName}.amount.
So, I think it makes sense to replace these two interfaces with two
configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key. We
may lose some extensibility, but AFAIK it could work with common
external resources like GPU, FPGA. WDYT?

Best,
Yangze Guo

On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen <se...@apache.org> wrote:
>
> Maybe one final comment: It is probably not an issue, but let's try and
> keep user code (via user code classloader) out of the ResourceManager, if
> possible.
>
> As background:
>
> There were thoughts in the past to support setups where the RM must run
> with "superuser" credentials, but we cannot run JM/TM with these
> credentials, as the user code might access them otherwise.
> This is actually possible today, you can run the RM in a different JVM or
> in a different container, and give it more credentials than JMs / TMs. But
> for this to be feasible, we cannot allow any user-defined code to be in the
> JVM, because that instantaneously breaks the isolation of credentials.
>
>
>
> On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo <ka...@gmail.com> wrote:
>
> > Thanks for the feedback, @Till and @Xintong.
> >
> > Regarding separating the interface, I'm also +1 with it.
> >
> > Regarding the resource allocation interface, true, it's dangerous to
> > give much access to user codes. Changing the return type to Map<String
> > key, String/Long value> makes sense to me. AFAIK, it is compatible
> > with all the first-party supported resources for Yarn/Kubernetes. It
> > could also free us from the potential dependency issue as well.
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song <to...@gmail.com>
> > wrote:
> > >
> > > Thanks for updating the FLIP, Yangze.
> > >
> > > I agree with Till that we probably want to separate the K8s/Yarn
> > decorator
> > > calls. Users can still configure one driver class, and we can use
> > > `instanceof` to check whether the driver implemented K8s/Yarn specific
> > > interfaces.
> > >
> > > Moreover, I'm not sure about exposing entire `ContainerRequest` / `Pod`
> > > (`AbstractKubernetesStepDecorator` directly manipulates on `Pod`) to user
> > > codes. It gives more access to user codes than needed for defining
> > external
> > > resource, which might cause problems. Instead, I would suggest to have
> > > interface like `Map<String key, String value>
> > > getYarn/KubernetesExternalResource()` and assemble them into
> > > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <tr...@apache.org>
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I'm a bit late to the party. I think the current proposal looks good.
> > > >
> > > > Concerning the ExternalResourceDriver interface defined in the FLIP
> > [1], I
> > > > would suggest to not include the decorator calls for Kubernetes and
> > Yarn in
> > > > the base interface. Instead I would suggest to segregate the deployment
> > > > specific decorator calls into separate interfaces. That way an
> > > > ExternalResourceDriver does not have to support all deployments from
> > the
> > > > very beginning. Moreover, some resources might not be supported by a
> > > > specific deployment target and the natural way to express this would
> > be to
> > > > not implement the respective deployment specific interface.
> > > >
> > > > Moreover, having void
> > > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> > containerRequest)
> > > > in the ExternalResourceDriver interface would require Hadoop on Flink's
> > > > classpath whenever the external resource driver is being used.
> > > >
> > > > [1]
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <se...@apache.org>
> > wrote:
> > > >
> > > > > Nice, thanks a lot!
> > > > >
> > > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <ka...@gmail.com>
> > wrote:
> > > > >
> > > > > > Thanks for the suggestion, @Stephan, @Becket and @Xintong.
> > > > > >
> > > > > > I've updated the FLIP accordingly. I do not add a
> > > > > > ResourceInfoProvider. Instead, I introduce the
> > ExternalResourceDriver,
> > > > > > which takes the responsibility of all relevant operations on both
> > RM
> > > > > > and TM sides.
> > > > > > After a rethink about decoupling the management of external
> > resources
> > > > > > from TaskExecutor, I think we could do the same thing on the
> > > > > > ResourceManager side. We do not need to add a specific allocation
> > > > > > logic to the ResourceManager each time we add a specific external
> > > > > > resource.
> > > > > > - For Yarn, we need the ExternalResourceDriver to edit the
> > > > > > containerRequest.
> > > > > > - For Kubenetes, ExternalResourceDriver could provide a decorator
> > for
> > > > > > the TM pod.
> > > > > >
> > > > > > In this way, just like MetricReporter, we allow users to define
> > their
> > > > > > custom ExternalResourceDriver. It is more extensible and fits the
> > > > > > separation of concerns. For more details, please take a look at
> > [1].
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <se...@apache.org>
> > wrote:
> > > > > > >
> > > > > > > This sounds good to go ahead from my side.
> > > > > > >
> > > > > > > I like the approach that Becket suggested - in that case the core
> > > > > > > abstraction that everyone would need to understand would be
> > "external
> > > > > > > resource allocation" and the "ResourceInfoProvider", and the GPU
> > > > > specific
> > > > > > > code would be a specific implementation only known to that
> > component
> > > > > that
> > > > > > > allocates the external resource. That fits the separation of
> > concerns
> > > > > > well.
> > > > > > >
> > > > > > > I also understand that it should not be over-engineered in the
> > first
> > > > > > > version, so some simplification makes sense, and then gradually
> > > > expand
> > > > > > from
> > > > > > > there.
> > > > > > >
> > > > > > > So +1 to go ahead with what was suggested above (Xintong /
> > Becket)
> > > > from
> > > > > > my
> > > > > > > side.
> > > > > > >
> > > > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <
> > tonysong820@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the comments, Stephan & Becket.
> > > > > > > >
> > > > > > > > @Stephan
> > > > > > > >
> > > > > > > > I see your concern, and I completely agree with you that we
> > should
> > > > > > first
> > > > > > > > think about the "library" / "plugin" / "extension" style if
> > > > possible.
> > > > > > > >
> > > > > > > > If GPUs are sliced and assigned during scheduling, there may be
> > > > > reason,
> > > > > > > > > although it looks that it would belong to the slot then. Is
> > that
> > > > > > what we
> > > > > > > > > are doing here?
> > > > > > > >
> > > > > > > >
> > > > > > > > In the current proposal, we do not have the GPUs sliced and
> > > > assigned
> > > > > to
> > > > > > > > slots, because it could be problematic without dynamic slot
> > > > > allocation.
> > > > > > > > E.g., the number of GPUs might not be evenly divisible by the
> > > > number
> > > > > of
> > > > > > > > slots.
> > > > > > > >
> > > > > > > > I think it makes sense to eventually have the GPUs assigned to
> > > > slots.
> > > > > > Even
> > > > > > > > then, we might still need a TM level GPUManager (or
> > > > ResourceProvider
> > > > > > like
> > > > > > > > Becket suggested). For memory, in each slot we can simply
> > request
> > > > the
> > > > > > > > amount of memory, leaving it to JVM / OS to decide which memory
> > > > > > (address)
> > > > > > > > should be assigned. For GPU, and potentially other resources
> > like
> > > > > > FPGA, we
> > > > > > > > need to explicitly specify which GPU (index) should be used.
> > > > > > Therefore, we
> > > > > > > > need some component at the TM level to coordinate which slot
> > uses
> > > > > which
> > > > > > > > GPU.
> > > > > > > >
> > > > > > > > IMO, unless we say Flink will not support slot-level GPU
> > slicing at
> > > > > > least
> > > > > > > > in the foreseeable future, I don't see a good way to avoid
> > touching
> > > > > > the TM
> > > > > > > > core. To that end, I think Becket's suggestion points to a good
> > > > > > direction,
> > > > > > > > that supports more features (GPU, FPGA, etc.) with less
> > coupling to
> > > > > > the TM
> > > > > > > > core (only needs to understand the general interfaces). The
> > > > detailed
> > > > > > > > implementation for specific resource types can even be
> > encapsulated
> > > > > as
> > > > > > a
> > > > > > > > library.
> > > > > > > >
> > > > > > > > @Becket
> > > > > > > >
> > > > > > > > Thanks for sharing your thought on the final state. Despite the
> > > > > > details how
> > > > > > > > the interfaces should look like, I think this is a really good
> > > > > > abstraction
> > > > > > > > for supporting general resource types.
> > > > > > > >
> > > > > > > > I'd like to further clarify that, the following three things
> > are
> > > > all
> > > > > > that
> > > > > > > > the "Flink core" needs to understand.
> > > > > > > >
> > > > > > > >    - The *amount* of resource, for scheduling. Actually, we
> > already
> > > > > > have
> > > > > > > >    the Resource class in ResourceProfile and ResourceSpec for
> > > > > extended
> > > > > > > >    resource. It's just not really used.
> > > > > > > >    - The *info*, that Flink provides to the operators / user
> > codes.
> > > > > > > >    - The *provider*, which generates the info based on the
> > amount.
> > > > > > > >
> > > > > > > > The "core" does not need to understand the specific
> > implementation
> > > > > > details
> > > > > > > > of the above three. They can even be implemented in a 3rd-party
> > > > > > library.
> > > > > > > > Similar to how we allow users to define their custom
> > > > MetricReporter.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <
> > becket.qin@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the comment, Stephan.
> > > > > > > > >
> > > > > > > > >   - If everything becomes a "core feature", it will make the
> > > > > project
> > > > > > hard
> > > > > > > > > > to develop in the future. Thinking "library" / "plugin" /
> > > > > > "extension"
> > > > > > > > > style
> > > > > > > > > > where possible helps.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Completely agree. It is much more important to design a
> > mechanism
> > > > > > than
> > > > > > > > > focusing on a specific case. Here is what I am thinking to
> > fully
> > > > > > support
> > > > > > > > > custom resource management:
> > > > > > > > > 1. On the JM / RM side, use ResourceProfile and ResourceSpec
> > to
> > > > > > define
> > > > > > > > the
> > > > > > > > > resource and the amount required. They will be used to find
> > > > > suitable
> > > > > > TMs
> > > > > > > > > slots to run the tasks. At this point, the resources are only
> > > > > > measured by
> > > > > > > > > amount, i.e. they do not have individual ID.
> > > > > > > > >
> > > > > > > > > 2. On the TM side, have something like
> > *"ResourceInfoProvider"*
> > > > to
> > > > > > > > identify
> > > > > > > > > and provides the detail information of the individual
> > resource,
> > > > > e.g.
> > > > > > GPU
> > > > > > > > > ID.. It is important because the operator may have to
> > explicitly
> > > > > > interact
> > > > > > > > > with the physical resource it uses. The ResourceInfoProvider
> > > > might
> > > > > > look
> > > > > > > > > like something below.
> > > > > > > > > interface ResourceInfoProvider<INFO> {
> > > > > > > > >     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId
> > opId,
> > > > > > > > > ResourceProfile resourceProfile);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > - There could be several "*ResourceInfoProvider*" configured
> > on
> > > > the
> > > > > > TM to
> > > > > > > > > retrieve the information for different resources.
> > > > > > > > > - The TM will be responsible to assign those individual
> > resources
> > > > > to
> > > > > > each
> > > > > > > > > operator according to their requested amount.
> > > > > > > > > - The operators will be able to get the ResourceInfo from
> > their
> > > > > > > > > RuntimeContext.
> > > > > > > > >
> > > > > > > > > If we agree this is a reasonable final state. We can adapt
> > the
> > > > > > current
> > > > > > > > FLIP
> > > > > > > > > to it. In fact it does not sound a big change to me. All the
> > > > > proposed
> > > > > > > > > configuration can be as is, it is just that Flink itself
> > won't
> > > > care
> > > > > > about
> > > > > > > > > them, instead a GPUInfoProviver implementing the
> > > > > ResourceInfoProvider
> > > > > > > > will
> > > > > > > > > use them.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > >
> > > > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <
> > sewen@apache.org>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all!
> > > > > > > > > >
> > > > > > > > > > The main point I wanted to throw into the discussion is the
> > > > > > following:
> > > > > > > > > >   - With more and more use cases, more and more tools go
> > into
> > > > > Flink
> > > > > > > > > >   - If everything becomes a "core feature", it will make
> > the
> > > > > > project
> > > > > > > > hard
> > > > > > > > > > to develop in the future. Thinking "library" / "plugin" /
> > > > > > "extension"
> > > > > > > > > style
> > > > > > > > > > where possible helps.
> > > > > > > > > >
> > > > > > > > > >   - A good thought experiment is always: How many future
> > > > > developers
> > > > > > > > have
> > > > > > > > > to
> > > > > > > > > > interact with this code (and possibly understand it
> > partially),
> > > > > > even if
> > > > > > > > > the
> > > > > > > > > > features they touch have nothing to do with GPU support. If
> > > > many
> > > > > > > > > > contributors to unrelated features will have to touch it
> > and
> > > > > > understand
> > > > > > > > > it,
> > > > > > > > > > then let's think if there is a different solution. Maybe
> > there
> > > > is
> > > > > > not,
> > > > > > > > > but
> > > > > > > > > > then we should be sure why.
> > > > > > > > > >
> > > > > > > > > >   - That led me to raising this issue: If the GPU manager
> > > > > becomes a
> > > > > > > > core
> > > > > > > > > > service in the TaskManager, Environment, RuntimeContext,
> > etc.
> > > > > then
> > > > > > > > > everyone
> > > > > > > > > > developing TM and streaming tasks need to understand the
> > GPU
> > > > > > manager.
> > > > > > > > > That
> > > > > > > > > > seems oddly specific, is my impression.
> > > > > > > > > >
> > > > > > > > > > Access to configuration seems not the right reason to do
> > that.
> > > > We
> > > > > > > > should
> > > > > > > > > > expose the Flink configuration from the RuntimeContext
> > anyways.
> > > > > > > > > >
> > > > > > > > > > If GPUs are sliced and assigned during scheduling, there
> > may be
> > > > > > reason,
> > > > > > > > > > although it looks that it would belong to the slot then. Is
> > > > that
> > > > > > what
> > > > > > > > we
> > > > > > > > > > are doing here?
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stephan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> > > > > > tonysong820@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > > > > >
> > > > > > > > > > > IMO, eventually an operator should only see info of GPUs
> > that
> > > > > are
> > > > > > > > > > dedicated
> > > > > > > > > > > for it, instead of all GPUs on the machine/container in
> > the
> > > > > > current
> > > > > > > > > > design.
> > > > > > > > > > > It does not make sense to let the user who writes a UDF
> > to
> > > > > worry
> > > > > > > > about
> > > > > > > > > > > coordination among multiple operators running on the same
> > > > > > machine.
> > > > > > > > And
> > > > > > > > > if
> > > > > > > > > > > we want to limit the GPU info an operator sees, we
> > should not
> > > > > > let the
> > > > > > > > > > > operator to instantiate GPUManager, which means we have
> > to
> > > > > expose
> > > > > > > > > > something
> > > > > > > > > > > through runtime context, either GPU info or some kind of
> > > > > limited
> > > > > > > > access
> > > > > > > > > > to
> > > > > > > > > > > the GPUManager.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <
> > > > > becket.qin@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > It probably make sense for us to first agree on the
> > final
> > > > > > state.
> > > > > > > > More
> > > > > > > > > > > > specifically, will the resource info be exposed through
> > > > > runtime
> > > > > > > > > context
> > > > > > > > > > > > eventually?
> > > > > > > > > > > >
> > > > > > > > > > > > If that is the final state and we have a seamless
> > migration
> > > > > > story
> > > > > > > > > from
> > > > > > > > > > > this
> > > > > > > > > > > > FLIP to that final state, Personally I think it is OK
> > to
> > > > > > expose the
> > > > > > > > > GPU
> > > > > > > > > > > > info in the runtime context.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> > > > > > > > tonysong820@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > @Yangze,
> > > > > > > > > > > > > I think what Stephan means (@Stephan, please correct
> > me
> > > > if
> > > > > > I'm
> > > > > > > > > wrong)
> > > > > > > > > > > is
> > > > > > > > > > > > > that, we might not need to hold and maintain the
> > > > GPUManager
> > > > > > as a
> > > > > > > > > > > service
> > > > > > > > > > > > in
> > > > > > > > > > > > > TaskManagerServices or RuntimeContext. An
> > alternative is
> > > > to
> > > > > > > > create
> > > > > > > > > /
> > > > > > > > > > > > > retrieve the GPUManager only in the operators that
> > need
> > > > it,
> > > > > > e.g.,
> > > > > > > > > > with
> > > > > > > > > > > a
> > > > > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Stephan,
> > > > > > > > > > > > > I agree with you on excluding GPUManager from
> > > > > > > > TaskManagerServices.
> > > > > > > > > > > > >
> > > > > > > > > > > > >    - For the first step, where we provide unified
> > > > TM-level
> > > > > > GPU
> > > > > > > > > > > > information
> > > > > > > > > > > > >    to all operators, it should be fine to have
> > operators
> > > > > > access /
> > > > > > > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > > > > > > >    - In future, we might have some more fine-grained
> > GPU
> > > > > > > > > management,
> > > > > > > > > > > > where
> > > > > > > > > > > > >    we need to maintain GPUManager as a service and
> > put
> > > > GPU
> > > > > > info
> > > > > > > > in
> > > > > > > > > > slot
> > > > > > > > > > > > >    profiles. But at least for now it's not necessary
> > to
> > > > > > introduce
> > > > > > > > > > such
> > > > > > > > > > > > >    complexity.
> > > > > > > > > > > > >
> > > > > > > > > > > > > However, I have some concerns on excluding GPUManager
> > > > from
> > > > > > > > > > > RuntimeContext
> > > > > > > > > > > > > and let operators access it directly.
> > > > > > > > > > > > >
> > > > > > > > > > > > >    - Configurations needed for creating the
> > GPUManager is
> > > > > not
> > > > > > > > > always
> > > > > > > > > > > > >    available for operators.
> > > > > > > > > > > > >    - If later we want to have fine-grained control
> > over
> > > > GPU
> > > > > > > > (e.g.,
> > > > > > > > > > > > >    operators in each slot can only see GPUs reserved
> > for
> > > > > that
> > > > > > > > > slot),
> > > > > > > > > > > the
> > > > > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I would suggest to wrap the GPUManager behind
> > > > > RuntimeContext
> > > > > > and
> > > > > > > > > only
> > > > > > > > > > > > > expose the GPUInfo to users. For now, we can declare
> > a
> > > > > method
> > > > > > > > > > > > > `getGPUInfo()` in RuntimeContext, with a default
> > > > definition
> > > > > > that
> > > > > > > > > > calls
> > > > > > > > > > > > > `GPUManager.get()` to get the lazily-created
> > GPUManager.
> > > > If
> > > > > > later
> > > > > > > > > we
> > > > > > > > > > > want
> > > > > > > > > > > > > to create / retrieve GPUManager in a different way,
> > we
> > > > can
> > > > > > simply
> > > > > > > > > > > change
> > > > > > > > > > > > > how `getGPUInfo` is implemented, without needing to
> > > > change
> > > > > > any
> > > > > > > > > public
> > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <
> > > > > > karmagyz@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > @Shephan
> > > > > > > > > > > > > > Do you mean Minicluster? Yes, it makes sense to
> > share
> > > > the
> > > > > > GPU
> > > > > > > > > > Manager
> > > > > > > > > > > > > > in such scenario.
> > > > > > > > > > > > > > If that's what you worry about, I'm +1 for holding
> > > > > > > > > > > > > > GPUManager(ExternalResourceManagers) in
> > TaskExecutor
> > > > > > instead of
> > > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regarding the RuntimeContext/FunctionContext, it
> > just
> > > > > > holds the
> > > > > > > > > GPU
> > > > > > > > > > > > > > info instead of the GPU Manager. AFAIK, it's the
> > only
> > > > > > place we
> > > > > > > > > > could
> > > > > > > > > > > > > > pass GPU info to the
> > RichFunction/UserDefinedFunction.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> > > > > > > > > > isaac@paddlesoft.net
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000
> > > > > sewen@apache.org
> > > > > > > > wrote
> > > > > > > > > > > ----
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can we somehow keep this out of the
> > TaskManager
> > > > > > services
> > > > > > > > > > > > > > > > I fear that we could not. IMO, the
> > GPUManager(or
> > > > > > > > > > > > > > > > ExternalServicesManagers in future) is
> > conceptually
> > > > > > one of
> > > > > > > > > the
> > > > > > > > > > > task
> > > > > > > > > > > > > > > > manager services, just like MemoryManager
> > before
> > > > > 1.10.
> > > > > > > > > > > > > > > > - It maintains/holds the GPU resource at TM
> > level
> > > > and
> > > > > > all
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > > > > > operators allocate the GPU resources from it.
> > So,
> > > > it
> > > > > > should
> > > > > > > > > be
> > > > > > > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > > > > > > - We could add a collection called
> > > > > > ExternalResourceManagers
> > > > > > > > > to
> > > > > > > > > > > hold
> > > > > > > > > > > > > > > > all managers of other external resources in the
> > > > > future.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can you help me understand why this needs the
> > > > addition
> > > > > in
> > > > > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > > > > Are you worried about the case when multiple Task
> > > > > > Executors
> > > > > > > > run
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > JVM? That's not common, but wouldn't it actually
> > be
> > > > > good
> > > > > > in
> > > > > > > > > that
> > > > > > > > > > > case
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > share the GPU Manager, given that the GPU is
> > shared?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ---------------------------
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > What parts need information about this?
> > > > > > > > > > > > > > > > In this FLIP, operators need the information.
> > Thus,
> > > > > we
> > > > > > > > expose
> > > > > > > > > > GPU
> > > > > > > > > > > > > > > > information to the
> > RuntimeContext/FunctionContext.
> > > > > The
> > > > > > slot
> > > > > > > > > > > profile
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > not aware of GPU resources as GPU is TM level
> > > > > resource
> > > > > > now.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can the GPU Manager be a "self contained"
> > thing
> > > > > that
> > > > > > > > simply
> > > > > > > > > > > takes
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > configuration, and then abstracts everything
> > > > > > internally?
> > > > > > > > > > > > > > > > Yes, we just pass the path/args of the discover
> > > > > script
> > > > > > and
> > > > > > > > > how
> > > > > > > > > > > many
> > > > > > > > > > > > > > > > GPUs per TM to it. It takes the responsibility
> > to
> > > > get
> > > > > > the
> > > > > > > > GPU
> > > > > > > > > > > > > > > > information and expose them to the
> > > > > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > > Operators. Meanwhile, we'd better not allow
> > > > operators
> > > > > > to
> > > > > > > > > > directly
> > > > > > > > > > > > > > > > access GPUManager, it should get what they want
> > > > from
> > > > > > > > Context.
> > > > > > > > > > We
> > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > then decouple the interface/implementation of
> > > > > > GPUManager
> > > > > > > > and
> > > > > > > > > > > Public
> > > > > > > > > > > > > > > > API.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <
> > > > > > > > > sewen@apache.org
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > It sounds fine to initially start with GPU
> > > > specific
> > > > > > > > support
> > > > > > > > > > and
> > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > generalizing this once we better understand
> > the
> > > > > > space.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > About the implementation suggested in
> > FLIP-108:
> > > > > > > > > > > > > > > > > - Can we somehow keep this out of the
> > TaskManager
> > > > > > > > services?
> > > > > > > > > > > > > Anything
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > have to pull through all layers of the TM
> > makes
> > > > the
> > > > > > TM
> > > > > > > > > > > components
> > > > > > > > > > > > > yet
> > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - What parts need information about this?
> > > > > > > > > > > > > > > > > -> do the slot profiles need information
> > about
> > > > the
> > > > > > GPU?
> > > > > > > > > > > > > > > > > -> Can the GPU Manager be a "self contained"
> > > > thing
> > > > > > that
> > > > > > > > > > simply
> > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > the configuration, and then abstracts
> > everything
> > > > > > > > > internally?
> > > > > > > > > > > > > > Operators
> > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> > > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > > > Regarding the WebUI and GPUInfo, you're
> > right,
> > > > > > I'll add
> > > > > > > > > > them
> > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > > > > Regarding the general extended resource
> > > > > mechanism,
> > > > > > I
> > > > > > > > > second
> > > > > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > > > > - It's better to leverage ResourceProfile
> > and
> > > > > > > > > ResourceSpec
> > > > > > > > > > > > after
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > supporting fine-grained GPU scheduling. As
> > a
> > > > > first
> > > > > > step
> > > > > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > > > > prefer to not include it in the scope of
> > this
> > > > > FLIP.
> > > > > > > > > > > > > > > > > > - Regarding the "Extended Resource
> > Manager",
> > > > if I
> > > > > > > > > > understand
> > > > > > > > > > > > > > > > > > correctly, it just a code refactoring atm,
> > we
> > > > > could
> > > > > > > > > extract
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > open/close/allocateExtendResources of
> > > > GPUManager
> > > > > to
> > > > > > > > that
> > > > > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > > > > that is the case, +1 to do it during
> > > > > > implementation.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > > > > As Xintong said, we looked into how Spark
> > > > > supports
> > > > > > a
> > > > > > > > > > general
> > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > Resource Scheduling" before and decided to
> > > > > > introduce a
> > > > > > > > > > common
> > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > >
> > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > > > > to make it more extensible. I think the
> > > > > "resource"
> > > > > > is a
> > > > > > > > > > > proper
> > > > > > > > > > > > > > level
> > > > > > > > > > > > > > > > > > to contain all the configs of extended
> > > > resources.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo
> > Huang <
> > > > > > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > There is no doubt that GPU resource
> > > > management
> > > > > > > > support
> > > > > > > > > > will
> > > > > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > > > > facilitate the development of AI-related
> > > > > > applications
> > > > > > > > > by
> > > > > > > > > > > > > PyFlink
> > > > > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Regarding the names of several GPU
> > > > > > configurations, I
> > > > > > > > > > think
> > > > > > > > > > > it
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > delete the resource field makes it
> > consistent
> > > > > > with
> > > > > > > > the
> > > > > > > > > > > names
> > > > > > > > > > > > of
> > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > resource-related configurations in
> > > > > > TaskManagerOption.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > e.g.
> > > > > > taskmanager.resource.gpu.discovery-script.path
> > > > > > > > ->
> > > > > > > > > > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Xintong Song <to...@gmail.com>
> > > > > > 于2020年3月4日周三
> > > > > > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Actually, Yangze, Yang and I also had
> > an
> > > > > > offline
> > > > > > > > > > > discussion
> > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > > > > the "GPU Support" as some general
> > "Extended
> > > > > > > > Resource
> > > > > > > > > > > > > Support".
> > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > > > > supporting extended resources in a
> > general
> > > > > > > > mechanism
> > > > > > > > > is
> > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > and extensible way. The reason we
> > propose
> > > > > this
> > > > > > FLIP
> > > > > > > > > > > > narrowing
> > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > > > down to GPU alone, is mainly for the
> > > > concern
> > > > > on
> > > > > > > > extra
> > > > > > > > > > > > efforts
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > > > > capacity needed for a general
> > mechanism.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > To come up with a well design on a
> > general
> > > > > > extended
> > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > mechanism, we would need to investigate
> > > > more
> > > > > > on how
> > > > > > > > > > > people
> > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > kind of resources in practice. For
> > GPU, we
> > > > > > learnt
> > > > > > > > > such
> > > > > > > > > > > > > > knowledge
> > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > experts, Becket and his team members.
> > But
> > > > for
> > > > > > FPGA,
> > > > > > > > > or
> > > > > > > > > > > > other
> > > > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > > > extended resources, we don't have such
> > > > > > convenient
> > > > > > > > > > > > information
> > > > > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > > > > making the investigation requires more
> > > > > efforts,
> > > > > > > > > which I
> > > > > > > > > > > > tend
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On the other hand, we also looked into
> > how
> > > > > > Spark
> > > > > > > > > > > supports a
> > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > Resource Scheduling". Assuming we want
> > to
> > > > > have
> > > > > > a
> > > > > > > > > > similar
> > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > resource mechanism in the future, we
> > > > believe
> > > > > > that
> > > > > > > > the
> > > > > > > > > > > > current
> > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > design can be easily extended, in an
> > > > > > incremental
> > > > > > > > way
> > > > > > > > > > > > without
> > > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > - The most important part is probably
> > user
> > > > > > > > > interfaces.
> > > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > > > > configuration options to define the
> > amount,
> > > > > > > > discovery
> > > > > > > > > > > > script
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > > > > k8s) in a per resource type bias [1],
> > which
> > > > > is
> > > > > > very
> > > > > > > > > > > similar
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > proposed in this FLIP. I think it's not
> > > > > > necessary
> > > > > > > > to
> > > > > > > > > > > expose
> > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > in the general way atm, since we do not
> > > > have
> > > > > > > > supports
> > > > > > > > > > for
> > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > types now. If later we decided to have
> > per
> > > > > > resource
> > > > > > > > > > type
> > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > > > > can have backwards compatibility on the
> > > > > current
> > > > > > > > > > proposed
> > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > > > > - For the GPU Manager, if later needed
> > we
> > > > can
> > > > > > > > change
> > > > > > > > > it
> > > > > > > > > > > to
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > > > > Resource Manager" (or whatever it is
> > > > called).
> > > > > > That
> > > > > > > > > > should
> > > > > > > > > > > > be
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > > > > > > > > - For ResourceProfile and ResourceSpec,
> > > > there
> > > > > > are
> > > > > > > > > > already
> > > > > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > > > > general extended resource. We can of
> > course
> > > > > > > > leverage
> > > > > > > > > > them
> > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > > > > fine grained GPU scheduling. That is
> > also
> > > > not
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > scope
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > > > step proposal, and would require
> > FLIP-56 to
> > > > > be
> > > > > > > > > finished
> > > > > > > > > > > > > first.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > To summary up, I agree with Becket that
> > > > have
> > > > > a
> > > > > > > > > separate
> > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > general extended resource mechanism,
> > and
> > > > keep
> > > > > > it in
> > > > > > > > > > mind
> > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket
> > Qin <
> > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > That's a good point, Stephan. It
> > makes
> > > > > total
> > > > > > > > sense
> > > > > > > > > to
> > > > > > > > > > > > > > generalize
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > resource management to support custom
> > > > > > resources.
> > > > > > > > > > Having
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > > to add new resources by themselves.
> > The
> > > > > > general
> > > > > > > > > > > resource
> > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 1. The custom resource type
> > definition.
> > > > It
> > > > > is
> > > > > > > > > > supported
> > > > > > > > > > > > by
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > resources in ResourceProfile and
> > > > > > ResourceSpec.
> > > > > > > > This
> > > > > > > > > > > will
> > > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 2. The custom resource allocation
> > logic,
> > > > > > i.e. how
> > > > > > > > > to
> > > > > > > > > > > > assign
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > > > to different tasks, operators, and
> > so on.
> > > > > > This
> > > > > > > > may
> > > > > > > > > > > > require
> > > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > > > > a. Subtask level - make sure the
> > subtasks
> > > > > > are put
> > > > > > > > > > into
> > > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > > > > It is done by the global RM and is
> > not
> > > > > > > > customizable
> > > > > > > > > > > right
> > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > b. Operator level - map the exact
> > > > resource
> > > > > > to the
> > > > > > > > > > > > operators
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for
> > operator
> > > > B.
> > > > > > This
> > > > > > > > > step
> > > > > > > > > > > is
> > > > > > > > > > > > > > needed
> > > > > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > > > > the global RM does not distinguish
> > > > > individual
> > > > > > > > > > resources
> > > > > > > > > > > > of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > > > > It is true for memory, but not for
> > GPU.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > The GPU manager is designed to do 2.b
> > > > here.
> > > > > > So it
> > > > > > > > > > > should
> > > > > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > > > > physical GPU information and
> > bind/match
> > > > > them
> > > > > > to
> > > > > > > > > each
> > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > general will fill in the missing
> > piece to
> > > > > > support
> > > > > > > > > > > custom
> > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > definition. But I'd avoid calling it
> > a
> > > > > > "External
> > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > > > confusion with RM, maybe something
> > like
> > > > > > "Operator
> > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > be more accurate. So for each
> > resource
> > > > type
> > > > > > users
> > > > > > > > > can
> > > > > > > > > > > > have
> > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > > > > "Operator Resource Assigner" in the
> > TM.
> > > > For
> > > > > > > > memory,
> > > > > > > > > > > users
> > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > > > > but for other extended resources,
> > users
> > > > may
> > > > > > need
> > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Personally I think a pluggable
> > "Operator
> > > > > > Resource
> > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > > > > in this FLIP. But I am also OK with
> > > > having
> > > > > > that
> > > > > > > > in
> > > > > > > > > a
> > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > the interface between the "Operator
> > > > > Resource
> > > > > > > > > > Assigner"
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > take a while to settle down if we
> > want to
> > > > > > make it
> > > > > > > > > > > > generic.
> > > > > > > > > > > > > > But I
> > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > > > implementation should take this
> > future
> > > > work
> > > > > > into
> > > > > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > > > > don't need to break backwards
> > > > compatibility
> > > > > > once
> > > > > > > > we
> > > > > > > > > > > have
> > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM
> > Stephan
> > > > > Ewen
> > > > > > <
> > > > > > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I cannot really give much input
> > into
> > > > the
> > > > > > > > > mechanics
> > > > > > > > > > of
> > > > > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > > > > and GPU allocation, as I have no
> > > > > experience
> > > > > > > > with
> > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > One thought I had when reading the
> > > > > > proposal is
> > > > > > > > if
> > > > > > > > > > it
> > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > > > > the "GPU Manager" as an "External
> > > > > Resource
> > > > > > > > > > Manager",
> > > > > > > > > > > > and
> > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > > > > The way I understand the
> > > > ResourceProfile
> > > > > > and
> > > > > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > > > > It has the advantage that it looks
> > more
> > > > > > > > > extensible.
> > > > > > > > > > > > Maybe
> > > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > Resource, a specialized NVIDIA GPU
> > > > > > Resource,
> > > > > > > > and
> > > > > > > > > > FPGA
> > > > > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM
> > Becket
> > > > > Qin <
> > > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU
> > > > > resource
> > > > > > > > > > management
> > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > > > > for machine learning use cases.
> > > > > Actually
> > > > > > it
> > > > > > > > is
> > > > > > > > > > one
> > > > > > > > > > > of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > > > > question from the users who are
> > > > > > interested in
> > > > > > > > > > using
> > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Some quick comments / questions
> > to
> > > > the
> > > > > > wiki.
> > > > > > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API should
> > > > probably
> > > > > > also
> > > > > > > > be
> > > > > > > > > > > > > > mentioned in
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > > > > 2. Is the data structure that
> > holds
> > > > GPU
> > > > > > info
> > > > > > > > > > also a
> > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM
> > > > Xintong
> > > > > > Song
> > > > > > > > <
> > > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks for drafting the FLIP
> > and
> > > > > > kicking
> > > > > > > > off
> > > > > > > > > > the
> > > > > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Big +1 for this feature.
> > Supporting
> > > > > > using
> > > > > > > > of
> > > > > > > > > > GPU
> > > > > > > > > > > in
> > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > > > > especially for the ML
> > scenarios.
> > > > > > > > > > > > > > > > > > > > > > > > I've reviewed the FLIP wiki
> > doc and
> > > > > it
> > > > > > > > looks
> > > > > > > > > > good
> > > > > > > > > > > > to
> > > > > > > > > > > > > > me. I
> > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > > > > very good first step for
> > Flink's
> > > > GPU
> > > > > > > > > supports.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM
> > > > > Yangze
> > > > > > Guo
> > > > > > > > <
> > > > > > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > We would like to start a
> > > > discussion
> > > > > > > > thread
> > > > > > > > > on
> > > > > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > This FLIP mainly discusses
> > the
> > > > > > following
> > > > > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > - Enable user to configure
> > how
> > > > many
> > > > > > GPUs
> > > > > > > > > in a
> > > > > > > > > > > > task
> > > > > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > forward such requirements to
> > the
> > > > > > external
> > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos
> > setups).
> > > > > > > > > > > > > > > > > > > > > > > > > - Provide information of
> > > > available
> > > > > > GPU
> > > > > > > > > > > resources
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Key changes proposed in the
> > FLIP
> > > > > are
> > > > > > as
> > > > > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU resource
> > > > requirements
> > > > > > to
> > > > > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > > > > - Introduce GPUManager as
> > one of
> > > > > the
> > > > > > task
> > > > > > > > > > > manager
> > > > > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > > > > and expose GPU resource
> > > > information
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > > context
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > > > > - Introduce the default
> > script
> > > > for
> > > > > > GPU
> > > > > > > > > > > discovery,
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > > > > the privilege mode to help
> > user
> > > > to
> > > > > > > > achieve
> > > > > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Please find more details in
> > the
> > > > > FLIP
> > > > > > wiki
> > > > > > > > > > > > document
> > > > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> >

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Stephan Ewen <se...@apache.org>.
Maybe one final comment: It is probably not an issue, but let's try and
keep user code (via user code classloader) out of the ResourceManager, if
possible.

As background:

There were thoughts in the past to support setups where the RM must run
with "superuser" credentials, but we cannot run JM/TM with these
credentials, as the user code might access them otherwise.
This is actually possible today, you can run the RM in a different JVM or
in a different container, and give it more credentials than JMs / TMs. But
for this to be feasible, we cannot allow any user-defined code to be in the
JVM, because that instantaneously breaks the isolation of credentials.



On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo <ka...@gmail.com> wrote:

> Thanks for the feedback, @Till and @Xintong.
>
> Regarding separating the interface, I'm also +1 with it.
>
> Regarding the resource allocation interface, true, it's dangerous to
> give much access to user codes. Changing the return type to Map<String
> key, String/Long value> makes sense to me. AFAIK, it is compatible
> with all the first-party supported resources for Yarn/Kubernetes. It
> could also free us from the potential dependency issue as well.
>
> Best,
> Yangze Guo
>
> On Fri, Mar 27, 2020 at 10:42 AM Xintong Song <to...@gmail.com>
> wrote:
> >
> > Thanks for updating the FLIP, Yangze.
> >
> > I agree with Till that we probably want to separate the K8s/Yarn
> decorator
> > calls. Users can still configure one driver class, and we can use
> > `instanceof` to check whether the driver implemented K8s/Yarn specific
> > interfaces.
> >
> > Moreover, I'm not sure about exposing entire `ContainerRequest` / `Pod`
> > (`AbstractKubernetesStepDecorator` directly manipulates on `Pod`) to user
> > codes. It gives more access to user codes than needed for defining
> external
> > resource, which might cause problems. Instead, I would suggest to have
> > interface like `Map<String key, String value>
> > getYarn/KubernetesExternalResource()` and assemble them into
> > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > > Hi everyone,
> > >
> > > I'm a bit late to the party. I think the current proposal looks good.
> > >
> > > Concerning the ExternalResourceDriver interface defined in the FLIP
> [1], I
> > > would suggest to not include the decorator calls for Kubernetes and
> Yarn in
> > > the base interface. Instead I would suggest to segregate the deployment
> > > specific decorator calls into separate interfaces. That way an
> > > ExternalResourceDriver does not have to support all deployments from
> the
> > > very beginning. Moreover, some resources might not be supported by a
> > > specific deployment target and the natural way to express this would
> be to
> > > not implement the respective deployment specific interface.
> > >
> > > Moreover, having void
> > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> containerRequest)
> > > in the ExternalResourceDriver interface would require Hadoop on Flink's
> > > classpath whenever the external resource driver is being used.
> > >
> > > [1]
> > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <se...@apache.org>
> wrote:
> > >
> > > > Nice, thanks a lot!
> > > >
> > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <ka...@gmail.com>
> wrote:
> > > >
> > > > > Thanks for the suggestion, @Stephan, @Becket and @Xintong.
> > > > >
> > > > > I've updated the FLIP accordingly. I do not add a
> > > > > ResourceInfoProvider. Instead, I introduce the
> ExternalResourceDriver,
> > > > > which takes the responsibility of all relevant operations on both
> RM
> > > > > and TM sides.
> > > > > After a rethink about decoupling the management of external
> resources
> > > > > from TaskExecutor, I think we could do the same thing on the
> > > > > ResourceManager side. We do not need to add a specific allocation
> > > > > logic to the ResourceManager each time we add a specific external
> > > > > resource.
> > > > > - For Yarn, we need the ExternalResourceDriver to edit the
> > > > > containerRequest.
> > > > > - For Kubenetes, ExternalResourceDriver could provide a decorator
> for
> > > > > the TM pod.
> > > > >
> > > > > In this way, just like MetricReporter, we allow users to define
> their
> > > > > custom ExternalResourceDriver. It is more extensible and fits the
> > > > > separation of concerns. For more details, please take a look at
> [1].
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <se...@apache.org>
> wrote:
> > > > > >
> > > > > > This sounds good to go ahead from my side.
> > > > > >
> > > > > > I like the approach that Becket suggested - in that case the core
> > > > > > abstraction that everyone would need to understand would be
> "external
> > > > > > resource allocation" and the "ResourceInfoProvider", and the GPU
> > > > specific
> > > > > > code would be a specific implementation only known to that
> component
> > > > that
> > > > > > allocates the external resource. That fits the separation of
> concerns
> > > > > well.
> > > > > >
> > > > > > I also understand that it should not be over-engineered in the
> first
> > > > > > version, so some simplification makes sense, and then gradually
> > > expand
> > > > > from
> > > > > > there.
> > > > > >
> > > > > > So +1 to go ahead with what was suggested above (Xintong /
> Becket)
> > > from
> > > > > my
> > > > > > side.
> > > > > >
> > > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <
> tonysong820@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Thanks for the comments, Stephan & Becket.
> > > > > > >
> > > > > > > @Stephan
> > > > > > >
> > > > > > > I see your concern, and I completely agree with you that we
> should
> > > > > first
> > > > > > > think about the "library" / "plugin" / "extension" style if
> > > possible.
> > > > > > >
> > > > > > > If GPUs are sliced and assigned during scheduling, there may be
> > > > reason,
> > > > > > > > although it looks that it would belong to the slot then. Is
> that
> > > > > what we
> > > > > > > > are doing here?
> > > > > > >
> > > > > > >
> > > > > > > In the current proposal, we do not have the GPUs sliced and
> > > assigned
> > > > to
> > > > > > > slots, because it could be problematic without dynamic slot
> > > > allocation.
> > > > > > > E.g., the number of GPUs might not be evenly divisible by the
> > > number
> > > > of
> > > > > > > slots.
> > > > > > >
> > > > > > > I think it makes sense to eventually have the GPUs assigned to
> > > slots.
> > > > > Even
> > > > > > > then, we might still need a TM level GPUManager (or
> > > ResourceProvider
> > > > > like
> > > > > > > Becket suggested). For memory, in each slot we can simply
> request
> > > the
> > > > > > > amount of memory, leaving it to JVM / OS to decide which memory
> > > > > (address)
> > > > > > > should be assigned. For GPU, and potentially other resources
> like
> > > > > FPGA, we
> > > > > > > need to explicitly specify which GPU (index) should be used.
> > > > > Therefore, we
> > > > > > > need some component at the TM level to coordinate which slot
> uses
> > > > which
> > > > > > > GPU.
> > > > > > >
> > > > > > > IMO, unless we say Flink will not support slot-level GPU
> slicing at
> > > > > least
> > > > > > > in the foreseeable future, I don't see a good way to avoid
> touching
> > > > > the TM
> > > > > > > core. To that end, I think Becket's suggestion points to a good
> > > > > direction,
> > > > > > > that supports more features (GPU, FPGA, etc.) with less
> coupling to
> > > > > the TM
> > > > > > > core (only needs to understand the general interfaces). The
> > > detailed
> > > > > > > implementation for specific resource types can even be
> encapsulated
> > > > as
> > > > > a
> > > > > > > library.
> > > > > > >
> > > > > > > @Becket
> > > > > > >
> > > > > > > Thanks for sharing your thought on the final state. Despite the
> > > > > details how
> > > > > > > the interfaces should look like, I think this is a really good
> > > > > abstraction
> > > > > > > for supporting general resource types.
> > > > > > >
> > > > > > > I'd like to further clarify that, the following three things
> are
> > > all
> > > > > that
> > > > > > > the "Flink core" needs to understand.
> > > > > > >
> > > > > > >    - The *amount* of resource, for scheduling. Actually, we
> already
> > > > > have
> > > > > > >    the Resource class in ResourceProfile and ResourceSpec for
> > > > extended
> > > > > > >    resource. It's just not really used.
> > > > > > >    - The *info*, that Flink provides to the operators / user
> codes.
> > > > > > >    - The *provider*, which generates the info based on the
> amount.
> > > > > > >
> > > > > > > The "core" does not need to understand the specific
> implementation
> > > > > details
> > > > > > > of the above three. They can even be implemented in a 3rd-party
> > > > > library.
> > > > > > > Similar to how we allow users to define their custom
> > > MetricReporter.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <
> becket.qin@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the comment, Stephan.
> > > > > > > >
> > > > > > > >   - If everything becomes a "core feature", it will make the
> > > > project
> > > > > hard
> > > > > > > > > to develop in the future. Thinking "library" / "plugin" /
> > > > > "extension"
> > > > > > > > style
> > > > > > > > > where possible helps.
> > > > > > > >
> > > > > > > >
> > > > > > > > Completely agree. It is much more important to design a
> mechanism
> > > > > than
> > > > > > > > focusing on a specific case. Here is what I am thinking to
> fully
> > > > > support
> > > > > > > > custom resource management:
> > > > > > > > 1. On the JM / RM side, use ResourceProfile and ResourceSpec
> to
> > > > > define
> > > > > > > the
> > > > > > > > resource and the amount required. They will be used to find
> > > > suitable
> > > > > TMs
> > > > > > > > slots to run the tasks. At this point, the resources are only
> > > > > measured by
> > > > > > > > amount, i.e. they do not have individual ID.
> > > > > > > >
> > > > > > > > 2. On the TM side, have something like
> *"ResourceInfoProvider"*
> > > to
> > > > > > > identify
> > > > > > > > and provides the detail information of the individual
> resource,
> > > > e.g.
> > > > > GPU
> > > > > > > > ID.. It is important because the operator may have to
> explicitly
> > > > > interact
> > > > > > > > with the physical resource it uses. The ResourceInfoProvider
> > > might
> > > > > look
> > > > > > > > like something below.
> > > > > > > > interface ResourceInfoProvider<INFO> {
> > > > > > > >     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId
> opId,
> > > > > > > > ResourceProfile resourceProfile);
> > > > > > > > }
> > > > > > > >
> > > > > > > > - There could be several "*ResourceInfoProvider*" configured
> on
> > > the
> > > > > TM to
> > > > > > > > retrieve the information for different resources.
> > > > > > > > - The TM will be responsible to assign those individual
> resources
> > > > to
> > > > > each
> > > > > > > > operator according to their requested amount.
> > > > > > > > - The operators will be able to get the ResourceInfo from
> their
> > > > > > > > RuntimeContext.
> > > > > > > >
> > > > > > > > If we agree this is a reasonable final state. We can adapt
> the
> > > > > current
> > > > > > > FLIP
> > > > > > > > to it. In fact it does not sound a big change to me. All the
> > > > proposed
> > > > > > > > configuration can be as is, it is just that Flink itself
> won't
> > > care
> > > > > about
> > > > > > > > them, instead a GPUInfoProviver implementing the
> > > > ResourceInfoProvider
> > > > > > > will
> > > > > > > > use them.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <
> sewen@apache.org>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all!
> > > > > > > > >
> > > > > > > > > The main point I wanted to throw into the discussion is the
> > > > > following:
> > > > > > > > >   - With more and more use cases, more and more tools go
> into
> > > > Flink
> > > > > > > > >   - If everything becomes a "core feature", it will make
> the
> > > > > project
> > > > > > > hard
> > > > > > > > > to develop in the future. Thinking "library" / "plugin" /
> > > > > "extension"
> > > > > > > > style
> > > > > > > > > where possible helps.
> > > > > > > > >
> > > > > > > > >   - A good thought experiment is always: How many future
> > > > developers
> > > > > > > have
> > > > > > > > to
> > > > > > > > > interact with this code (and possibly understand it
> partially),
> > > > > even if
> > > > > > > > the
> > > > > > > > > features they touch have nothing to do with GPU support. If
> > > many
> > > > > > > > > contributors to unrelated features will have to touch it
> and
> > > > > understand
> > > > > > > > it,
> > > > > > > > > then let's think if there is a different solution. Maybe
> there
> > > is
> > > > > not,
> > > > > > > > but
> > > > > > > > > then we should be sure why.
> > > > > > > > >
> > > > > > > > >   - That led me to raising this issue: If the GPU manager
> > > > becomes a
> > > > > > > core
> > > > > > > > > service in the TaskManager, Environment, RuntimeContext,
> etc.
> > > > then
> > > > > > > > everyone
> > > > > > > > > developing TM and streaming tasks need to understand the
> GPU
> > > > > manager.
> > > > > > > > That
> > > > > > > > > seems oddly specific, is my impression.
> > > > > > > > >
> > > > > > > > > Access to configuration seems not the right reason to do
> that.
> > > We
> > > > > > > should
> > > > > > > > > expose the Flink configuration from the RuntimeContext
> anyways.
> > > > > > > > >
> > > > > > > > > If GPUs are sliced and assigned during scheduling, there
> may be
> > > > > reason,
> > > > > > > > > although it looks that it would belong to the slot then. Is
> > > that
> > > > > what
> > > > > > > we
> > > > > > > > > are doing here?
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stephan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> > > > > tonysong820@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > > > >
> > > > > > > > > > IMO, eventually an operator should only see info of GPUs
> that
> > > > are
> > > > > > > > > dedicated
> > > > > > > > > > for it, instead of all GPUs on the machine/container in
> the
> > > > > current
> > > > > > > > > design.
> > > > > > > > > > It does not make sense to let the user who writes a UDF
> to
> > > > worry
> > > > > > > about
> > > > > > > > > > coordination among multiple operators running on the same
> > > > > machine.
> > > > > > > And
> > > > > > > > if
> > > > > > > > > > we want to limit the GPU info an operator sees, we
> should not
> > > > > let the
> > > > > > > > > > operator to instantiate GPUManager, which means we have
> to
> > > > expose
> > > > > > > > > something
> > > > > > > > > > through runtime context, either GPU info or some kind of
> > > > limited
> > > > > > > access
> > > > > > > > > to
> > > > > > > > > > the GPUManager.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <
> > > > becket.qin@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > It probably make sense for us to first agree on the
> final
> > > > > state.
> > > > > > > More
> > > > > > > > > > > specifically, will the resource info be exposed through
> > > > runtime
> > > > > > > > context
> > > > > > > > > > > eventually?
> > > > > > > > > > >
> > > > > > > > > > > If that is the final state and we have a seamless
> migration
> > > > > story
> > > > > > > > from
> > > > > > > > > > this
> > > > > > > > > > > FLIP to that final state, Personally I think it is OK
> to
> > > > > expose the
> > > > > > > > GPU
> > > > > > > > > > > info in the runtime context.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> > > > > > > tonysong820@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > @Yangze,
> > > > > > > > > > > > I think what Stephan means (@Stephan, please correct
> me
> > > if
> > > > > I'm
> > > > > > > > wrong)
> > > > > > > > > > is
> > > > > > > > > > > > that, we might not need to hold and maintain the
> > > GPUManager
> > > > > as a
> > > > > > > > > > service
> > > > > > > > > > > in
> > > > > > > > > > > > TaskManagerServices or RuntimeContext. An
> alternative is
> > > to
> > > > > > > create
> > > > > > > > /
> > > > > > > > > > > > retrieve the GPUManager only in the operators that
> need
> > > it,
> > > > > e.g.,
> > > > > > > > > with
> > > > > > > > > > a
> > > > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > > > >
> > > > > > > > > > > > @Stephan,
> > > > > > > > > > > > I agree with you on excluding GPUManager from
> > > > > > > TaskManagerServices.
> > > > > > > > > > > >
> > > > > > > > > > > >    - For the first step, where we provide unified
> > > TM-level
> > > > > GPU
> > > > > > > > > > > information
> > > > > > > > > > > >    to all operators, it should be fine to have
> operators
> > > > > access /
> > > > > > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > > > > > >    - In future, we might have some more fine-grained
> GPU
> > > > > > > > management,
> > > > > > > > > > > where
> > > > > > > > > > > >    we need to maintain GPUManager as a service and
> put
> > > GPU
> > > > > info
> > > > > > > in
> > > > > > > > > slot
> > > > > > > > > > > >    profiles. But at least for now it's not necessary
> to
> > > > > introduce
> > > > > > > > > such
> > > > > > > > > > > >    complexity.
> > > > > > > > > > > >
> > > > > > > > > > > > However, I have some concerns on excluding GPUManager
> > > from
> > > > > > > > > > RuntimeContext
> > > > > > > > > > > > and let operators access it directly.
> > > > > > > > > > > >
> > > > > > > > > > > >    - Configurations needed for creating the
> GPUManager is
> > > > not
> > > > > > > > always
> > > > > > > > > > > >    available for operators.
> > > > > > > > > > > >    - If later we want to have fine-grained control
> over
> > > GPU
> > > > > > > (e.g.,
> > > > > > > > > > > >    operators in each slot can only see GPUs reserved
> for
> > > > that
> > > > > > > > slot),
> > > > > > > > > > the
> > > > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > > > >
> > > > > > > > > > > > I would suggest to wrap the GPUManager behind
> > > > RuntimeContext
> > > > > and
> > > > > > > > only
> > > > > > > > > > > > expose the GPUInfo to users. For now, we can declare
> a
> > > > method
> > > > > > > > > > > > `getGPUInfo()` in RuntimeContext, with a default
> > > definition
> > > > > that
> > > > > > > > > calls
> > > > > > > > > > > > `GPUManager.get()` to get the lazily-created
> GPUManager.
> > > If
> > > > > later
> > > > > > > > we
> > > > > > > > > > want
> > > > > > > > > > > > to create / retrieve GPUManager in a different way,
> we
> > > can
> > > > > simply
> > > > > > > > > > change
> > > > > > > > > > > > how `getGPUInfo` is implemented, without needing to
> > > change
> > > > > any
> > > > > > > > public
> > > > > > > > > > > > interfaces.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <
> > > > > karmagyz@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > @Shephan
> > > > > > > > > > > > > Do you mean Minicluster? Yes, it makes sense to
> share
> > > the
> > > > > GPU
> > > > > > > > > Manager
> > > > > > > > > > > > > in such scenario.
> > > > > > > > > > > > > If that's what you worry about, I'm +1 for holding
> > > > > > > > > > > > > GPUManager(ExternalResourceManagers) in
> TaskExecutor
> > > > > instead of
> > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding the RuntimeContext/FunctionContext, it
> just
> > > > > holds the
> > > > > > > > GPU
> > > > > > > > > > > > > info instead of the GPU Manager. AFAIK, it's the
> only
> > > > > place we
> > > > > > > > > could
> > > > > > > > > > > > > pass GPU info to the
> RichFunction/UserDefinedFunction.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> > > > > > > > > isaac@paddlesoft.net
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000
> > > > sewen@apache.org
> > > > > > > wrote
> > > > > > > > > > ----
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can we somehow keep this out of the
> TaskManager
> > > > > services
> > > > > > > > > > > > > > > I fear that we could not. IMO, the
> GPUManager(or
> > > > > > > > > > > > > > > ExternalServicesManagers in future) is
> conceptually
> > > > > one of
> > > > > > > > the
> > > > > > > > > > task
> > > > > > > > > > > > > > > manager services, just like MemoryManager
> before
> > > > 1.10.
> > > > > > > > > > > > > > > - It maintains/holds the GPU resource at TM
> level
> > > and
> > > > > all
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > > > > > operators allocate the GPU resources from it.
> So,
> > > it
> > > > > should
> > > > > > > > be
> > > > > > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > > > > > - We could add a collection called
> > > > > ExternalResourceManagers
> > > > > > > > to
> > > > > > > > > > hold
> > > > > > > > > > > > > > > all managers of other external resources in the
> > > > future.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you help me understand why this needs the
> > > addition
> > > > in
> > > > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > > > Are you worried about the case when multiple Task
> > > > > Executors
> > > > > > > run
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > same
> > > > > > > > > > > > > > JVM? That's not common, but wouldn't it actually
> be
> > > > good
> > > > > in
> > > > > > > > that
> > > > > > > > > > case
> > > > > > > > > > > > to
> > > > > > > > > > > > > > share the GPU Manager, given that the GPU is
> shared?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ---------------------------
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > What parts need information about this?
> > > > > > > > > > > > > > > In this FLIP, operators need the information.
> Thus,
> > > > we
> > > > > > > expose
> > > > > > > > > GPU
> > > > > > > > > > > > > > > information to the
> RuntimeContext/FunctionContext.
> > > > The
> > > > > slot
> > > > > > > > > > profile
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > not aware of GPU resources as GPU is TM level
> > > > resource
> > > > > now.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can the GPU Manager be a "self contained"
> thing
> > > > that
> > > > > > > simply
> > > > > > > > > > takes
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > configuration, and then abstracts everything
> > > > > internally?
> > > > > > > > > > > > > > > Yes, we just pass the path/args of the discover
> > > > script
> > > > > and
> > > > > > > > how
> > > > > > > > > > many
> > > > > > > > > > > > > > > GPUs per TM to it. It takes the responsibility
> to
> > > get
> > > > > the
> > > > > > > GPU
> > > > > > > > > > > > > > > information and expose them to the
> > > > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > > > of
> > > > > > > > > > > > > > > Operators. Meanwhile, we'd better not allow
> > > operators
> > > > > to
> > > > > > > > > directly
> > > > > > > > > > > > > > > access GPUManager, it should get what they want
> > > from
> > > > > > > Context.
> > > > > > > > > We
> > > > > > > > > > > > could
> > > > > > > > > > > > > > > then decouple the interface/implementation of
> > > > > GPUManager
> > > > > > > and
> > > > > > > > > > Public
> > > > > > > > > > > > > > > API.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <
> > > > > > > > sewen@apache.org
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It sounds fine to initially start with GPU
> > > specific
> > > > > > > support
> > > > > > > > > and
> > > > > > > > > > > > think
> > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > generalizing this once we better understand
> the
> > > > > space.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > About the implementation suggested in
> FLIP-108:
> > > > > > > > > > > > > > > > - Can we somehow keep this out of the
> TaskManager
> > > > > > > services?
> > > > > > > > > > > > Anything
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > have to pull through all layers of the TM
> makes
> > > the
> > > > > TM
> > > > > > > > > > components
> > > > > > > > > > > > yet
> > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - What parts need information about this?
> > > > > > > > > > > > > > > > -> do the slot profiles need information
> about
> > > the
> > > > > GPU?
> > > > > > > > > > > > > > > > -> Can the GPU Manager be a "self contained"
> > > thing
> > > > > that
> > > > > > > > > simply
> > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > the configuration, and then abstracts
> everything
> > > > > > > > internally?
> > > > > > > > > > > > > Operators
> > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> > > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > > Regarding the WebUI and GPUInfo, you're
> right,
> > > > > I'll add
> > > > > > > > > them
> > > > > > > > > > to
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > > > Regarding the general extended resource
> > > > mechanism,
> > > > > I
> > > > > > > > second
> > > > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > > > - It's better to leverage ResourceProfile
> and
> > > > > > > > ResourceSpec
> > > > > > > > > > > after
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > supporting fine-grained GPU scheduling. As
> a
> > > > first
> > > > > step
> > > > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > > > prefer to not include it in the scope of
> this
> > > > FLIP.
> > > > > > > > > > > > > > > > > - Regarding the "Extended Resource
> Manager",
> > > if I
> > > > > > > > > understand
> > > > > > > > > > > > > > > > > correctly, it just a code refactoring atm,
> we
> > > > could
> > > > > > > > extract
> > > > > > > > > > the
> > > > > > > > > > > > > > > > > open/close/allocateExtendResources of
> > > GPUManager
> > > > to
> > > > > > > that
> > > > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > > > that is the case, +1 to do it during
> > > > > implementation.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > > > As Xintong said, we looked into how Spark
> > > > supports
> > > > > a
> > > > > > > > > general
> > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > Resource Scheduling" before and decided to
> > > > > introduce a
> > > > > > > > > common
> > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > >
> schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > > > to make it more extensible. I think the
> > > > "resource"
> > > > > is a
> > > > > > > > > > proper
> > > > > > > > > > > > > level
> > > > > > > > > > > > > > > > > to contain all the configs of extended
> > > resources.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo
> Huang <
> > > > > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > There is no doubt that GPU resource
> > > management
> > > > > > > support
> > > > > > > > > will
> > > > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > > > facilitate the development of AI-related
> > > > > applications
> > > > > > > > by
> > > > > > > > > > > > PyFlink
> > > > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Regarding the names of several GPU
> > > > > configurations, I
> > > > > > > > > think
> > > > > > > > > > it
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > delete the resource field makes it
> consistent
> > > > > with
> > > > > > > the
> > > > > > > > > > names
> > > > > > > > > > > of
> > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > resource-related configurations in
> > > > > TaskManagerOption.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > e.g.
> > > > > taskmanager.resource.gpu.discovery-script.path
> > > > > > > ->
> > > > > > > > > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song <to...@gmail.com>
> > > > > 于2020年3月4日周三
> > > > > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Actually, Yangze, Yang and I also had
> an
> > > > > offline
> > > > > > > > > > discussion
> > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > > > the "GPU Support" as some general
> "Extended
> > > > > > > Resource
> > > > > > > > > > > > Support".
> > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > > > supporting extended resources in a
> general
> > > > > > > mechanism
> > > > > > > > is
> > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > and extensible way. The reason we
> propose
> > > > this
> > > > > FLIP
> > > > > > > > > > > narrowing
> > > > > > > > > > > > > its
> > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > > down to GPU alone, is mainly for the
> > > concern
> > > > on
> > > > > > > extra
> > > > > > > > > > > efforts
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > > > capacity needed for a general
> mechanism.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > To come up with a well design on a
> general
> > > > > extended
> > > > > > > > > > > resource
> > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > mechanism, we would need to investigate
> > > more
> > > > > on how
> > > > > > > > > > people
> > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > kind of resources in practice. For
> GPU, we
> > > > > learnt
> > > > > > > > such
> > > > > > > > > > > > > knowledge
> > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > experts, Becket and his team members.
> But
> > > for
> > > > > FPGA,
> > > > > > > > or
> > > > > > > > > > > other
> > > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > > extended resources, we don't have such
> > > > > convenient
> > > > > > > > > > > information
> > > > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > > > making the investigation requires more
> > > > efforts,
> > > > > > > > which I
> > > > > > > > > > > tend
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On the other hand, we also looked into
> how
> > > > > Spark
> > > > > > > > > > supports a
> > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > Resource Scheduling". Assuming we want
> to
> > > > have
> > > > > a
> > > > > > > > > similar
> > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > resource mechanism in the future, we
> > > believe
> > > > > that
> > > > > > > the
> > > > > > > > > > > current
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > design can be easily extended, in an
> > > > > incremental
> > > > > > > way
> > > > > > > > > > > without
> > > > > > > > > > > > > too
> > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > - The most important part is probably
> user
> > > > > > > > interfaces.
> > > > > > > > > > > Spark
> > > > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > > > configuration options to define the
> amount,
> > > > > > > discovery
> > > > > > > > > > > script
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > > > k8s) in a per resource type bias [1],
> which
> > > > is
> > > > > very
> > > > > > > > > > similar
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > proposed in this FLIP. I think it's not
> > > > > necessary
> > > > > > > to
> > > > > > > > > > expose
> > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > in the general way atm, since we do not
> > > have
> > > > > > > supports
> > > > > > > > > for
> > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > types now. If later we decided to have
> per
> > > > > resource
> > > > > > > > > type
> > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > > > can have backwards compatibility on the
> > > > current
> > > > > > > > > proposed
> > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > > > - For the GPU Manager, if later needed
> we
> > > can
> > > > > > > change
> > > > > > > > it
> > > > > > > > > > to
> > > > > > > > > > > a
> > > > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > > > Resource Manager" (or whatever it is
> > > called).
> > > > > That
> > > > > > > > > should
> > > > > > > > > > > be
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > > > > > > > - For ResourceProfile and ResourceSpec,
> > > there
> > > > > are
> > > > > > > > > already
> > > > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > > > general extended resource. We can of
> course
> > > > > > > leverage
> > > > > > > > > them
> > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > > > fine grained GPU scheduling. That is
> also
> > > not
> > > > > in
> > > > > > > the
> > > > > > > > > > scope
> > > > > > > > > > > of
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > > step proposal, and would require
> FLIP-56 to
> > > > be
> > > > > > > > finished
> > > > > > > > > > > > first.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > To summary up, I agree with Becket that
> > > have
> > > > a
> > > > > > > > separate
> > > > > > > > > > > FLIP
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > general extended resource mechanism,
> and
> > > keep
> > > > > it in
> > > > > > > > > mind
> > > > > > > > > > > when
> > > > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket
> Qin <
> > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > That's a good point, Stephan. It
> makes
> > > > total
> > > > > > > sense
> > > > > > > > to
> > > > > > > > > > > > > generalize
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > resource management to support custom
> > > > > resources.
> > > > > > > > > Having
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > to add new resources by themselves.
> The
> > > > > general
> > > > > > > > > > resource
> > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 1. The custom resource type
> definition.
> > > It
> > > > is
> > > > > > > > > supported
> > > > > > > > > > > by
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > resources in ResourceProfile and
> > > > > ResourceSpec.
> > > > > > > This
> > > > > > > > > > will
> > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 2. The custom resource allocation
> logic,
> > > > > i.e. how
> > > > > > > > to
> > > > > > > > > > > assign
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > > to different tasks, operators, and
> so on.
> > > > > This
> > > > > > > may
> > > > > > > > > > > require
> > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > > > a. Subtask level - make sure the
> subtasks
> > > > > are put
> > > > > > > > > into
> > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > > > It is done by the global RM and is
> not
> > > > > > > customizable
> > > > > > > > > > right
> > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > b. Operator level - map the exact
> > > resource
> > > > > to the
> > > > > > > > > > > operators
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for
> operator
> > > B.
> > > > > This
> > > > > > > > step
> > > > > > > > > > is
> > > > > > > > > > > > > needed
> > > > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > > > the global RM does not distinguish
> > > > individual
> > > > > > > > > resources
> > > > > > > > > > > of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > > > It is true for memory, but not for
> GPU.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > The GPU manager is designed to do 2.b
> > > here.
> > > > > So it
> > > > > > > > > > should
> > > > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > > > physical GPU information and
> bind/match
> > > > them
> > > > > to
> > > > > > > > each
> > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > general will fill in the missing
> piece to
> > > > > support
> > > > > > > > > > custom
> > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > definition. But I'd avoid calling it
> a
> > > > > "External
> > > > > > > > > > Resource
> > > > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > > confusion with RM, maybe something
> like
> > > > > "Operator
> > > > > > > > > > > Resource
> > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > be more accurate. So for each
> resource
> > > type
> > > > > users
> > > > > > > > can
> > > > > > > > > > > have
> > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > > > "Operator Resource Assigner" in the
> TM.
> > > For
> > > > > > > memory,
> > > > > > > > > > users
> > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > > > but for other extended resources,
> users
> > > may
> > > > > need
> > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Personally I think a pluggable
> "Operator
> > > > > Resource
> > > > > > > > > > > Assigner"
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > > > in this FLIP. But I am also OK with
> > > having
> > > > > that
> > > > > > > in
> > > > > > > > a
> > > > > > > > > > > > separate
> > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > the interface between the "Operator
> > > > Resource
> > > > > > > > > Assigner"
> > > > > > > > > > > and
> > > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > take a while to settle down if we
> want to
> > > > > make it
> > > > > > > > > > > generic.
> > > > > > > > > > > > > But I
> > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > > implementation should take this
> future
> > > work
> > > > > into
> > > > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > > > don't need to break backwards
> > > compatibility
> > > > > once
> > > > > > > we
> > > > > > > > > > have
> > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM
> Stephan
> > > > Ewen
> > > > > <
> > > > > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I cannot really give much input
> into
> > > the
> > > > > > > > mechanics
> > > > > > > > > of
> > > > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > > > and GPU allocation, as I have no
> > > > experience
> > > > > > > with
> > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > One thought I had when reading the
> > > > > proposal is
> > > > > > > if
> > > > > > > > > it
> > > > > > > > > > > > makes
> > > > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > > > the "GPU Manager" as an "External
> > > > Resource
> > > > > > > > > Manager",
> > > > > > > > > > > and
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > > > The way I understand the
> > > ResourceProfile
> > > > > and
> > > > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > > > It has the advantage that it looks
> more
> > > > > > > > extensible.
> > > > > > > > > > > Maybe
> > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > Resource, a specialized NVIDIA GPU
> > > > > Resource,
> > > > > > > and
> > > > > > > > > FPGA
> > > > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM
> Becket
> > > > Qin <
> > > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU
> > > > resource
> > > > > > > > > management
> > > > > > > > > > > > > support
> > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > > > for machine learning use cases.
> > > > Actually
> > > > > it
> > > > > > > is
> > > > > > > > > one
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > > > question from the users who are
> > > > > interested in
> > > > > > > > > using
> > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Some quick comments / questions
> to
> > > the
> > > > > wiki.
> > > > > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API should
> > > probably
> > > > > also
> > > > > > > be
> > > > > > > > > > > > > mentioned in
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > > > 2. Is the data structure that
> holds
> > > GPU
> > > > > info
> > > > > > > > > also a
> > > > > > > > > > > > > public
> > > > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM
> > > Xintong
> > > > > Song
> > > > > > > <
> > > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks for drafting the FLIP
> and
> > > > > kicking
> > > > > > > off
> > > > > > > > > the
> > > > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Big +1 for this feature.
> Supporting
> > > > > using
> > > > > > > of
> > > > > > > > > GPU
> > > > > > > > > > in
> > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > > > especially for the ML
> scenarios.
> > > > > > > > > > > > > > > > > > > > > > > I've reviewed the FLIP wiki
> doc and
> > > > it
> > > > > > > looks
> > > > > > > > > good
> > > > > > > > > > > to
> > > > > > > > > > > > > me. I
> > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > > > very good first step for
> Flink's
> > > GPU
> > > > > > > > supports.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM
> > > > Yangze
> > > > > Guo
> > > > > > > <
> > > > > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > We would like to start a
> > > discussion
> > > > > > > thread
> > > > > > > > on
> > > > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > This FLIP mainly discusses
> the
> > > > > following
> > > > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > - Enable user to configure
> how
> > > many
> > > > > GPUs
> > > > > > > > in a
> > > > > > > > > > > task
> > > > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > forward such requirements to
> the
> > > > > external
> > > > > > > > > > > resource
> > > > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos
> setups).
> > > > > > > > > > > > > > > > > > > > > > > > - Provide information of
> > > available
> > > > > GPU
> > > > > > > > > > resources
> > > > > > > > > > > to
> > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Key changes proposed in the
> FLIP
> > > > are
> > > > > as
> > > > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU resource
> > > requirements
> > > > > to
> > > > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > > > - Introduce GPUManager as
> one of
> > > > the
> > > > > task
> > > > > > > > > > manager
> > > > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > > > and expose GPU resource
> > > information
> > > > > to
> > > > > > > the
> > > > > > > > > > > context
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > > > - Introduce the default
> script
> > > for
> > > > > GPU
> > > > > > > > > > discovery,
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > > > the privilege mode to help
> user
> > > to
> > > > > > > achieve
> > > > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Please find more details in
> the
> > > > FLIP
> > > > > wiki
> > > > > > > > > > > document
> > > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Yangze Guo <ka...@gmail.com>.
Thanks for the feedback, @Till and @Xintong.

Regarding separating the interface, I'm also +1 with it.

Regarding the resource allocation interface, true, it's dangerous to
give much access to user codes. Changing the return type to Map<String
key, String/Long value> makes sense to me. AFAIK, it is compatible
with all the first-party supported resources for Yarn/Kubernetes. It
could also free us from the potential dependency issue as well.

Best,
Yangze Guo

On Fri, Mar 27, 2020 at 10:42 AM Xintong Song <to...@gmail.com> wrote:
>
> Thanks for updating the FLIP, Yangze.
>
> I agree with Till that we probably want to separate the K8s/Yarn decorator
> calls. Users can still configure one driver class, and we can use
> `instanceof` to check whether the driver implemented K8s/Yarn specific
> interfaces.
>
> Moreover, I'm not sure about exposing entire `ContainerRequest` / `Pod`
> (`AbstractKubernetesStepDecorator` directly manipulates on `Pod`) to user
> codes. It gives more access to user codes than needed for defining external
> resource, which might cause problems. Instead, I would suggest to have
> interface like `Map<String key, String value>
> getYarn/KubernetesExternalResource()` and assemble them into
> `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <tr...@apache.org> wrote:
>
> > Hi everyone,
> >
> > I'm a bit late to the party. I think the current proposal looks good.
> >
> > Concerning the ExternalResourceDriver interface defined in the FLIP [1], I
> > would suggest to not include the decorator calls for Kubernetes and Yarn in
> > the base interface. Instead I would suggest to segregate the deployment
> > specific decorator calls into separate interfaces. That way an
> > ExternalResourceDriver does not have to support all deployments from the
> > very beginning. Moreover, some resources might not be supported by a
> > specific deployment target and the natural way to express this would be to
> > not implement the respective deployment specific interface.
> >
> > Moreover, having void
> > addExternalResourceToRequest(AMRMClient.ContainerRequest containerRequest)
> > in the ExternalResourceDriver interface would require Hadoop on Flink's
> > classpath whenever the external resource driver is being used.
> >
> > [1]
> >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> >
> > Cheers,
> > Till
> >
> > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <se...@apache.org> wrote:
> >
> > > Nice, thanks a lot!
> > >
> > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <ka...@gmail.com> wrote:
> > >
> > > > Thanks for the suggestion, @Stephan, @Becket and @Xintong.
> > > >
> > > > I've updated the FLIP accordingly. I do not add a
> > > > ResourceInfoProvider. Instead, I introduce the ExternalResourceDriver,
> > > > which takes the responsibility of all relevant operations on both RM
> > > > and TM sides.
> > > > After a rethink about decoupling the management of external resources
> > > > from TaskExecutor, I think we could do the same thing on the
> > > > ResourceManager side. We do not need to add a specific allocation
> > > > logic to the ResourceManager each time we add a specific external
> > > > resource.
> > > > - For Yarn, we need the ExternalResourceDriver to edit the
> > > > containerRequest.
> > > > - For Kubenetes, ExternalResourceDriver could provide a decorator for
> > > > the TM pod.
> > > >
> > > > In this way, just like MetricReporter, we allow users to define their
> > > > custom ExternalResourceDriver. It is more extensible and fits the
> > > > separation of concerns. For more details, please take a look at [1].
> > > >
> > > > [1]
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <se...@apache.org> wrote:
> > > > >
> > > > > This sounds good to go ahead from my side.
> > > > >
> > > > > I like the approach that Becket suggested - in that case the core
> > > > > abstraction that everyone would need to understand would be "external
> > > > > resource allocation" and the "ResourceInfoProvider", and the GPU
> > > specific
> > > > > code would be a specific implementation only known to that component
> > > that
> > > > > allocates the external resource. That fits the separation of concerns
> > > > well.
> > > > >
> > > > > I also understand that it should not be over-engineered in the first
> > > > > version, so some simplification makes sense, and then gradually
> > expand
> > > > from
> > > > > there.
> > > > >
> > > > > So +1 to go ahead with what was suggested above (Xintong / Becket)
> > from
> > > > my
> > > > > side.
> > > > >
> > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <to...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks for the comments, Stephan & Becket.
> > > > > >
> > > > > > @Stephan
> > > > > >
> > > > > > I see your concern, and I completely agree with you that we should
> > > > first
> > > > > > think about the "library" / "plugin" / "extension" style if
> > possible.
> > > > > >
> > > > > > If GPUs are sliced and assigned during scheduling, there may be
> > > reason,
> > > > > > > although it looks that it would belong to the slot then. Is that
> > > > what we
> > > > > > > are doing here?
> > > > > >
> > > > > >
> > > > > > In the current proposal, we do not have the GPUs sliced and
> > assigned
> > > to
> > > > > > slots, because it could be problematic without dynamic slot
> > > allocation.
> > > > > > E.g., the number of GPUs might not be evenly divisible by the
> > number
> > > of
> > > > > > slots.
> > > > > >
> > > > > > I think it makes sense to eventually have the GPUs assigned to
> > slots.
> > > > Even
> > > > > > then, we might still need a TM level GPUManager (or
> > ResourceProvider
> > > > like
> > > > > > Becket suggested). For memory, in each slot we can simply request
> > the
> > > > > > amount of memory, leaving it to JVM / OS to decide which memory
> > > > (address)
> > > > > > should be assigned. For GPU, and potentially other resources like
> > > > FPGA, we
> > > > > > need to explicitly specify which GPU (index) should be used.
> > > > Therefore, we
> > > > > > need some component at the TM level to coordinate which slot uses
> > > which
> > > > > > GPU.
> > > > > >
> > > > > > IMO, unless we say Flink will not support slot-level GPU slicing at
> > > > least
> > > > > > in the foreseeable future, I don't see a good way to avoid touching
> > > > the TM
> > > > > > core. To that end, I think Becket's suggestion points to a good
> > > > direction,
> > > > > > that supports more features (GPU, FPGA, etc.) with less coupling to
> > > > the TM
> > > > > > core (only needs to understand the general interfaces). The
> > detailed
> > > > > > implementation for specific resource types can even be encapsulated
> > > as
> > > > a
> > > > > > library.
> > > > > >
> > > > > > @Becket
> > > > > >
> > > > > > Thanks for sharing your thought on the final state. Despite the
> > > > details how
> > > > > > the interfaces should look like, I think this is a really good
> > > > abstraction
> > > > > > for supporting general resource types.
> > > > > >
> > > > > > I'd like to further clarify that, the following three things are
> > all
> > > > that
> > > > > > the "Flink core" needs to understand.
> > > > > >
> > > > > >    - The *amount* of resource, for scheduling. Actually, we already
> > > > have
> > > > > >    the Resource class in ResourceProfile and ResourceSpec for
> > > extended
> > > > > >    resource. It's just not really used.
> > > > > >    - The *info*, that Flink provides to the operators / user codes.
> > > > > >    - The *provider*, which generates the info based on the amount.
> > > > > >
> > > > > > The "core" does not need to understand the specific implementation
> > > > details
> > > > > > of the above three. They can even be implemented in a 3rd-party
> > > > library.
> > > > > > Similar to how we allow users to define their custom
> > MetricReporter.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <be...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Thanks for the comment, Stephan.
> > > > > > >
> > > > > > >   - If everything becomes a "core feature", it will make the
> > > project
> > > > hard
> > > > > > > > to develop in the future. Thinking "library" / "plugin" /
> > > > "extension"
> > > > > > > style
> > > > > > > > where possible helps.
> > > > > > >
> > > > > > >
> > > > > > > Completely agree. It is much more important to design a mechanism
> > > > than
> > > > > > > focusing on a specific case. Here is what I am thinking to fully
> > > > support
> > > > > > > custom resource management:
> > > > > > > 1. On the JM / RM side, use ResourceProfile and ResourceSpec to
> > > > define
> > > > > > the
> > > > > > > resource and the amount required. They will be used to find
> > > suitable
> > > > TMs
> > > > > > > slots to run the tasks. At this point, the resources are only
> > > > measured by
> > > > > > > amount, i.e. they do not have individual ID.
> > > > > > >
> > > > > > > 2. On the TM side, have something like *"ResourceInfoProvider"*
> > to
> > > > > > identify
> > > > > > > and provides the detail information of the individual resource,
> > > e.g.
> > > > GPU
> > > > > > > ID.. It is important because the operator may have to explicitly
> > > > interact
> > > > > > > with the physical resource it uses. The ResourceInfoProvider
> > might
> > > > look
> > > > > > > like something below.
> > > > > > > interface ResourceInfoProvider<INFO> {
> > > > > > >     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId opId,
> > > > > > > ResourceProfile resourceProfile);
> > > > > > > }
> > > > > > >
> > > > > > > - There could be several "*ResourceInfoProvider*" configured on
> > the
> > > > TM to
> > > > > > > retrieve the information for different resources.
> > > > > > > - The TM will be responsible to assign those individual resources
> > > to
> > > > each
> > > > > > > operator according to their requested amount.
> > > > > > > - The operators will be able to get the ResourceInfo from their
> > > > > > > RuntimeContext.
> > > > > > >
> > > > > > > If we agree this is a reasonable final state. We can adapt the
> > > > current
> > > > > > FLIP
> > > > > > > to it. In fact it does not sound a big change to me. All the
> > > proposed
> > > > > > > configuration can be as is, it is just that Flink itself won't
> > care
> > > > about
> > > > > > > them, instead a GPUInfoProviver implementing the
> > > ResourceInfoProvider
> > > > > > will
> > > > > > > use them.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <se...@apache.org>
> > > > wrote:
> > > > > > >
> > > > > > > > Hi all!
> > > > > > > >
> > > > > > > > The main point I wanted to throw into the discussion is the
> > > > following:
> > > > > > > >   - With more and more use cases, more and more tools go into
> > > Flink
> > > > > > > >   - If everything becomes a "core feature", it will make the
> > > > project
> > > > > > hard
> > > > > > > > to develop in the future. Thinking "library" / "plugin" /
> > > > "extension"
> > > > > > > style
> > > > > > > > where possible helps.
> > > > > > > >
> > > > > > > >   - A good thought experiment is always: How many future
> > > developers
> > > > > > have
> > > > > > > to
> > > > > > > > interact with this code (and possibly understand it partially),
> > > > even if
> > > > > > > the
> > > > > > > > features they touch have nothing to do with GPU support. If
> > many
> > > > > > > > contributors to unrelated features will have to touch it and
> > > > understand
> > > > > > > it,
> > > > > > > > then let's think if there is a different solution. Maybe there
> > is
> > > > not,
> > > > > > > but
> > > > > > > > then we should be sure why.
> > > > > > > >
> > > > > > > >   - That led me to raising this issue: If the GPU manager
> > > becomes a
> > > > > > core
> > > > > > > > service in the TaskManager, Environment, RuntimeContext, etc.
> > > then
> > > > > > > everyone
> > > > > > > > developing TM and streaming tasks need to understand the GPU
> > > > manager.
> > > > > > > That
> > > > > > > > seems oddly specific, is my impression.
> > > > > > > >
> > > > > > > > Access to configuration seems not the right reason to do that.
> > We
> > > > > > should
> > > > > > > > expose the Flink configuration from the RuntimeContext anyways.
> > > > > > > >
> > > > > > > > If GPUs are sliced and assigned during scheduling, there may be
> > > > reason,
> > > > > > > > although it looks that it would belong to the slot then. Is
> > that
> > > > what
> > > > > > we
> > > > > > > > are doing here?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> > > > tonysong820@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > > >
> > > > > > > > > IMO, eventually an operator should only see info of GPUs that
> > > are
> > > > > > > > dedicated
> > > > > > > > > for it, instead of all GPUs on the machine/container in the
> > > > current
> > > > > > > > design.
> > > > > > > > > It does not make sense to let the user who writes a UDF to
> > > worry
> > > > > > about
> > > > > > > > > coordination among multiple operators running on the same
> > > > machine.
> > > > > > And
> > > > > > > if
> > > > > > > > > we want to limit the GPU info an operator sees, we should not
> > > > let the
> > > > > > > > > operator to instantiate GPUManager, which means we have to
> > > expose
> > > > > > > > something
> > > > > > > > > through runtime context, either GPU info or some kind of
> > > limited
> > > > > > access
> > > > > > > > to
> > > > > > > > > the GPUManager.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <
> > > becket.qin@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > It probably make sense for us to first agree on the final
> > > > state.
> > > > > > More
> > > > > > > > > > specifically, will the resource info be exposed through
> > > runtime
> > > > > > > context
> > > > > > > > > > eventually?
> > > > > > > > > >
> > > > > > > > > > If that is the final state and we have a seamless migration
> > > > story
> > > > > > > from
> > > > > > > > > this
> > > > > > > > > > FLIP to that final state, Personally I think it is OK to
> > > > expose the
> > > > > > > GPU
> > > > > > > > > > info in the runtime context.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> > > > > > tonysong820@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > @Yangze,
> > > > > > > > > > > I think what Stephan means (@Stephan, please correct me
> > if
> > > > I'm
> > > > > > > wrong)
> > > > > > > > > is
> > > > > > > > > > > that, we might not need to hold and maintain the
> > GPUManager
> > > > as a
> > > > > > > > > service
> > > > > > > > > > in
> > > > > > > > > > > TaskManagerServices or RuntimeContext. An alternative is
> > to
> > > > > > create
> > > > > > > /
> > > > > > > > > > > retrieve the GPUManager only in the operators that need
> > it,
> > > > e.g.,
> > > > > > > > with
> > > > > > > > > a
> > > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > > >
> > > > > > > > > > > @Stephan,
> > > > > > > > > > > I agree with you on excluding GPUManager from
> > > > > > TaskManagerServices.
> > > > > > > > > > >
> > > > > > > > > > >    - For the first step, where we provide unified
> > TM-level
> > > > GPU
> > > > > > > > > > information
> > > > > > > > > > >    to all operators, it should be fine to have operators
> > > > access /
> > > > > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > > > > >    - In future, we might have some more fine-grained GPU
> > > > > > > management,
> > > > > > > > > > where
> > > > > > > > > > >    we need to maintain GPUManager as a service and put
> > GPU
> > > > info
> > > > > > in
> > > > > > > > slot
> > > > > > > > > > >    profiles. But at least for now it's not necessary to
> > > > introduce
> > > > > > > > such
> > > > > > > > > > >    complexity.
> > > > > > > > > > >
> > > > > > > > > > > However, I have some concerns on excluding GPUManager
> > from
> > > > > > > > > RuntimeContext
> > > > > > > > > > > and let operators access it directly.
> > > > > > > > > > >
> > > > > > > > > > >    - Configurations needed for creating the GPUManager is
> > > not
> > > > > > > always
> > > > > > > > > > >    available for operators.
> > > > > > > > > > >    - If later we want to have fine-grained control over
> > GPU
> > > > > > (e.g.,
> > > > > > > > > > >    operators in each slot can only see GPUs reserved for
> > > that
> > > > > > > slot),
> > > > > > > > > the
> > > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > > >
> > > > > > > > > > > I would suggest to wrap the GPUManager behind
> > > RuntimeContext
> > > > and
> > > > > > > only
> > > > > > > > > > > expose the GPUInfo to users. For now, we can declare a
> > > method
> > > > > > > > > > > `getGPUInfo()` in RuntimeContext, with a default
> > definition
> > > > that
> > > > > > > > calls
> > > > > > > > > > > `GPUManager.get()` to get the lazily-created GPUManager.
> > If
> > > > later
> > > > > > > we
> > > > > > > > > want
> > > > > > > > > > > to create / retrieve GPUManager in a different way, we
> > can
> > > > simply
> > > > > > > > > change
> > > > > > > > > > > how `getGPUInfo` is implemented, without needing to
> > change
> > > > any
> > > > > > > public
> > > > > > > > > > > interfaces.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <
> > > > karmagyz@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > @Shephan
> > > > > > > > > > > > Do you mean Minicluster? Yes, it makes sense to share
> > the
> > > > GPU
> > > > > > > > Manager
> > > > > > > > > > > > in such scenario.
> > > > > > > > > > > > If that's what you worry about, I'm +1 for holding
> > > > > > > > > > > > GPUManager(ExternalResourceManagers) in TaskExecutor
> > > > instead of
> > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > >
> > > > > > > > > > > > Regarding the RuntimeContext/FunctionContext, it just
> > > > holds the
> > > > > > > GPU
> > > > > > > > > > > > info instead of the GPU Manager. AFAIK, it's the only
> > > > place we
> > > > > > > > could
> > > > > > > > > > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> > > > > > > > isaac@paddlesoft.net
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000
> > > sewen@apache.org
> > > > > > wrote
> > > > > > > > > ----
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can we somehow keep this out of the TaskManager
> > > > services
> > > > > > > > > > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > > > > > > > > > ExternalServicesManagers in future) is conceptually
> > > > one of
> > > > > > > the
> > > > > > > > > task
> > > > > > > > > > > > > > manager services, just like MemoryManager before
> > > 1.10.
> > > > > > > > > > > > > > - It maintains/holds the GPU resource at TM level
> > and
> > > > all
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > > > > operators allocate the GPU resources from it. So,
> > it
> > > > should
> > > > > > > be
> > > > > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > > > > - We could add a collection called
> > > > ExternalResourceManagers
> > > > > > > to
> > > > > > > > > hold
> > > > > > > > > > > > > > all managers of other external resources in the
> > > future.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you help me understand why this needs the
> > addition
> > > in
> > > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > > Are you worried about the case when multiple Task
> > > > Executors
> > > > > > run
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > same
> > > > > > > > > > > > > JVM? That's not common, but wouldn't it actually be
> > > good
> > > > in
> > > > > > > that
> > > > > > > > > case
> > > > > > > > > > > to
> > > > > > > > > > > > > share the GPU Manager, given that the GPU is shared?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Stephan
> > > > > > > > > > > > >
> > > > > > > > > > > > > ---------------------------
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > What parts need information about this?
> > > > > > > > > > > > > > In this FLIP, operators need the information. Thus,
> > > we
> > > > > > expose
> > > > > > > > GPU
> > > > > > > > > > > > > > information to the RuntimeContext/FunctionContext.
> > > The
> > > > slot
> > > > > > > > > profile
> > > > > > > > > > > is
> > > > > > > > > > > > > > not aware of GPU resources as GPU is TM level
> > > resource
> > > > now.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can the GPU Manager be a "self contained" thing
> > > that
> > > > > > simply
> > > > > > > > > takes
> > > > > > > > > > > the
> > > > > > > > > > > > > > configuration, and then abstracts everything
> > > > internally?
> > > > > > > > > > > > > > Yes, we just pass the path/args of the discover
> > > script
> > > > and
> > > > > > > how
> > > > > > > > > many
> > > > > > > > > > > > > > GPUs per TM to it. It takes the responsibility to
> > get
> > > > the
> > > > > > GPU
> > > > > > > > > > > > > > information and expose them to the
> > > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > > of
> > > > > > > > > > > > > > Operators. Meanwhile, we'd better not allow
> > operators
> > > > to
> > > > > > > > directly
> > > > > > > > > > > > > > access GPUManager, it should get what they want
> > from
> > > > > > Context.
> > > > > > > > We
> > > > > > > > > > > could
> > > > > > > > > > > > > > then decouple the interface/implementation of
> > > > GPUManager
> > > > > > and
> > > > > > > > > Public
> > > > > > > > > > > > > > API.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <
> > > > > > > sewen@apache.org
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It sounds fine to initially start with GPU
> > specific
> > > > > > support
> > > > > > > > and
> > > > > > > > > > > think
> > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > generalizing this once we better understand the
> > > > space.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > > > > > > > > > - Can we somehow keep this out of the TaskManager
> > > > > > services?
> > > > > > > > > > > Anything
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > have to pull through all layers of the TM makes
> > the
> > > > TM
> > > > > > > > > components
> > > > > > > > > > > yet
> > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - What parts need information about this?
> > > > > > > > > > > > > > > -> do the slot profiles need information about
> > the
> > > > GPU?
> > > > > > > > > > > > > > > -> Can the GPU Manager be a "self contained"
> > thing
> > > > that
> > > > > > > > simply
> > > > > > > > > > > takes
> > > > > > > > > > > > > > > the configuration, and then abstracts everything
> > > > > > > internally?
> > > > > > > > > > > > Operators
> > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> > > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > Regarding the WebUI and GPUInfo, you're right,
> > > > I'll add
> > > > > > > > them
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > > Regarding the general extended resource
> > > mechanism,
> > > > I
> > > > > > > second
> > > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > > - It's better to leverage ResourceProfile and
> > > > > > > ResourceSpec
> > > > > > > > > > after
> > > > > > > > > > > we
> > > > > > > > > > > > > > > > supporting fine-grained GPU scheduling. As a
> > > first
> > > > step
> > > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > > prefer to not include it in the scope of this
> > > FLIP.
> > > > > > > > > > > > > > > > - Regarding the "Extended Resource Manager",
> > if I
> > > > > > > > understand
> > > > > > > > > > > > > > > > correctly, it just a code refactoring atm, we
> > > could
> > > > > > > extract
> > > > > > > > > the
> > > > > > > > > > > > > > > > open/close/allocateExtendResources of
> > GPUManager
> > > to
> > > > > > that
> > > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > > that is the case, +1 to do it during
> > > > implementation.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > > As Xintong said, we looked into how Spark
> > > supports
> > > > a
> > > > > > > > general
> > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > Resource Scheduling" before and decided to
> > > > introduce a
> > > > > > > > common
> > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > > to make it more extensible. I think the
> > > "resource"
> > > > is a
> > > > > > > > > proper
> > > > > > > > > > > > level
> > > > > > > > > > > > > > > > to contain all the configs of extended
> > resources.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > > > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > There is no doubt that GPU resource
> > management
> > > > > > support
> > > > > > > > will
> > > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > > facilitate the development of AI-related
> > > > applications
> > > > > > > by
> > > > > > > > > > > PyFlink
> > > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Regarding the names of several GPU
> > > > configurations, I
> > > > > > > > think
> > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > delete the resource field makes it consistent
> > > > with
> > > > > > the
> > > > > > > > > names
> > > > > > > > > > of
> > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > resource-related configurations in
> > > > TaskManagerOption.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > e.g.
> > > > taskmanager.resource.gpu.discovery-script.path
> > > > > > ->
> > > > > > > > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Xintong Song <to...@gmail.com>
> > > > 于2020年3月4日周三
> > > > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Actually, Yangze, Yang and I also had an
> > > > offline
> > > > > > > > > discussion
> > > > > > > > > > > > about
> > > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > > the "GPU Support" as some general "Extended
> > > > > > Resource
> > > > > > > > > > > Support".
> > > > > > > > > > > > We
> > > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > > supporting extended resources in a general
> > > > > > mechanism
> > > > > > > is
> > > > > > > > > > > > definitely
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > and extensible way. The reason we propose
> > > this
> > > > FLIP
> > > > > > > > > > narrowing
> > > > > > > > > > > > its
> > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > down to GPU alone, is mainly for the
> > concern
> > > on
> > > > > > extra
> > > > > > > > > > efforts
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > To come up with a well design on a general
> > > > extended
> > > > > > > > > > resource
> > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > mechanism, we would need to investigate
> > more
> > > > on how
> > > > > > > > > people
> > > > > > > > > > > use
> > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > kind of resources in practice. For GPU, we
> > > > learnt
> > > > > > > such
> > > > > > > > > > > > knowledge
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > experts, Becket and his team members. But
> > for
> > > > FPGA,
> > > > > > > or
> > > > > > > > > > other
> > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > extended resources, we don't have such
> > > > convenient
> > > > > > > > > > information
> > > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > > making the investigation requires more
> > > efforts,
> > > > > > > which I
> > > > > > > > > > tend
> > > > > > > > > > > to
> > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On the other hand, we also looked into how
> > > > Spark
> > > > > > > > > supports a
> > > > > > > > > > > > general
> > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > Resource Scheduling". Assuming we want to
> > > have
> > > > a
> > > > > > > > similar
> > > > > > > > > > > > general
> > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > resource mechanism in the future, we
> > believe
> > > > that
> > > > > > the
> > > > > > > > > > current
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > design can be easily extended, in an
> > > > incremental
> > > > > > way
> > > > > > > > > > without
> > > > > > > > > > > > too
> > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - The most important part is probably user
> > > > > > > interfaces.
> > > > > > > > > > Spark
> > > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > > configuration options to define the amount,
> > > > > > discovery
> > > > > > > > > > script
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > > k8s) in a per resource type bias [1], which
> > > is
> > > > very
> > > > > > > > > similar
> > > > > > > > > > > to
> > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > proposed in this FLIP. I think it's not
> > > > necessary
> > > > > > to
> > > > > > > > > expose
> > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > in the general way atm, since we do not
> > have
> > > > > > supports
> > > > > > > > for
> > > > > > > > > > > other
> > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > types now. If later we decided to have per
> > > > resource
> > > > > > > > type
> > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > > can have backwards compatibility on the
> > > current
> > > > > > > > proposed
> > > > > > > > > > > > options
> > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > > - For the GPU Manager, if later needed we
> > can
> > > > > > change
> > > > > > > it
> > > > > > > > > to
> > > > > > > > > > a
> > > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > > Resource Manager" (or whatever it is
> > called).
> > > > That
> > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > > a
> > > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > > > > > > - For ResourceProfile and ResourceSpec,
> > there
> > > > are
> > > > > > > > already
> > > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > > general extended resource. We can of course
> > > > > > leverage
> > > > > > > > them
> > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > > fine grained GPU scheduling. That is also
> > not
> > > > in
> > > > > > the
> > > > > > > > > scope
> > > > > > > > > > of
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > step proposal, and would require FLIP-56 to
> > > be
> > > > > > > finished
> > > > > > > > > > > first.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > To summary up, I agree with Becket that
> > have
> > > a
> > > > > > > separate
> > > > > > > > > > FLIP
> > > > > > > > > > > > for
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > general extended resource mechanism, and
> > keep
> > > > it in
> > > > > > > > mind
> > > > > > > > > > when
> > > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > That's a good point, Stephan. It makes
> > > total
> > > > > > sense
> > > > > > > to
> > > > > > > > > > > > generalize
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > resource management to support custom
> > > > resources.
> > > > > > > > Having
> > > > > > > > > > > that
> > > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > to add new resources by themselves. The
> > > > general
> > > > > > > > > resource
> > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 1. The custom resource type definition.
> > It
> > > is
> > > > > > > > supported
> > > > > > > > > > by
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > resources in ResourceProfile and
> > > > ResourceSpec.
> > > > > > This
> > > > > > > > > will
> > > > > > > > > > > > likely
> > > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 2. The custom resource allocation logic,
> > > > i.e. how
> > > > > > > to
> > > > > > > > > > assign
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > to different tasks, operators, and so on.
> > > > This
> > > > > > may
> > > > > > > > > > require
> > > > > > > > > > > > two
> > > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > > a. Subtask level - make sure the subtasks
> > > > are put
> > > > > > > > into
> > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > > It is done by the global RM and is not
> > > > > > customizable
> > > > > > > > > right
> > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > b. Operator level - map the exact
> > resource
> > > > to the
> > > > > > > > > > operators
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator
> > B.
> > > > This
> > > > > > > step
> > > > > > > > > is
> > > > > > > > > > > > needed
> > > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > > the global RM does not distinguish
> > > individual
> > > > > > > > resources
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The GPU manager is designed to do 2.b
> > here.
> > > > So it
> > > > > > > > > should
> > > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > > physical GPU information and bind/match
> > > them
> > > > to
> > > > > > > each
> > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > general will fill in the missing piece to
> > > > support
> > > > > > > > > custom
> > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > definition. But I'd avoid calling it a
> > > > "External
> > > > > > > > > Resource
> > > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > confusion with RM, maybe something like
> > > > "Operator
> > > > > > > > > > Resource
> > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > be more accurate. So for each resource
> > type
> > > > users
> > > > > > > can
> > > > > > > > > > have
> > > > > > > > > > > an
> > > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > > "Operator Resource Assigner" in the TM.
> > For
> > > > > > memory,
> > > > > > > > > users
> > > > > > > > > > > > don't
> > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > > but for other extended resources, users
> > may
> > > > need
> > > > > > > > that.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Personally I think a pluggable "Operator
> > > > Resource
> > > > > > > > > > Assigner"
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > > in this FLIP. But I am also OK with
> > having
> > > > that
> > > > > > in
> > > > > > > a
> > > > > > > > > > > separate
> > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > the interface between the "Operator
> > > Resource
> > > > > > > > Assigner"
> > > > > > > > > > and
> > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > take a while to settle down if we want to
> > > > make it
> > > > > > > > > > generic.
> > > > > > > > > > > > But I
> > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > implementation should take this future
> > work
> > > > into
> > > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > > don't need to break backwards
> > compatibility
> > > > once
> > > > > > we
> > > > > > > > > have
> > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan
> > > Ewen
> > > > <
> > > > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I cannot really give much input into
> > the
> > > > > > > mechanics
> > > > > > > > of
> > > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > > and GPU allocation, as I have no
> > > experience
> > > > > > with
> > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > One thought I had when reading the
> > > > proposal is
> > > > > > if
> > > > > > > > it
> > > > > > > > > > > makes
> > > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > > the "GPU Manager" as an "External
> > > Resource
> > > > > > > > Manager",
> > > > > > > > > > and
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > > The way I understand the
> > ResourceProfile
> > > > and
> > > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > > It has the advantage that it looks more
> > > > > > > extensible.
> > > > > > > > > > Maybe
> > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > Resource, a specialized NVIDIA GPU
> > > > Resource,
> > > > > > and
> > > > > > > > FPGA
> > > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket
> > > Qin <
> > > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU
> > > resource
> > > > > > > > management
> > > > > > > > > > > > support
> > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > > for machine learning use cases.
> > > Actually
> > > > it
> > > > > > is
> > > > > > > > one
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > > question from the users who are
> > > > interested in
> > > > > > > > using
> > > > > > > > > > > Flink
> > > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Some quick comments / questions to
> > the
> > > > wiki.
> > > > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API should
> > probably
> > > > also
> > > > > > be
> > > > > > > > > > > > mentioned in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > > 2. Is the data structure that holds
> > GPU
> > > > info
> > > > > > > > also a
> > > > > > > > > > > > public
> > > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM
> > Xintong
> > > > Song
> > > > > > <
> > > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks for drafting the FLIP and
> > > > kicking
> > > > > > off
> > > > > > > > the
> > > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Big +1 for this feature. Supporting
> > > > using
> > > > > > of
> > > > > > > > GPU
> > > > > > > > > in
> > > > > > > > > > > > Flink
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and
> > > it
> > > > > > looks
> > > > > > > > good
> > > > > > > > > > to
> > > > > > > > > > > > me. I
> > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > > very good first step for Flink's
> > GPU
> > > > > > > supports.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM
> > > Yangze
> > > > Guo
> > > > > > <
> > > > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > We would like to start a
> > discussion
> > > > > > thread
> > > > > > > on
> > > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > This FLIP mainly discusses the
> > > > following
> > > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > - Enable user to configure how
> > many
> > > > GPUs
> > > > > > > in a
> > > > > > > > > > task
> > > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > forward such requirements to the
> > > > external
> > > > > > > > > > resource
> > > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > > > > > > > > > - Provide information of
> > available
> > > > GPU
> > > > > > > > > resources
> > > > > > > > > > to
> > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Key changes proposed in the FLIP
> > > are
> > > > as
> > > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > - Forward GPU resource
> > requirements
> > > > to
> > > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > > - Introduce GPUManager as one of
> > > the
> > > > task
> > > > > > > > > manager
> > > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > > and expose GPU resource
> > information
> > > > to
> > > > > > the
> > > > > > > > > > context
> > > > > > > > > > > of
> > > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > > - Introduce the default script
> > for
> > > > GPU
> > > > > > > > > discovery,
> > > > > > > > > > > in
> > > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > > the privilege mode to help user
> > to
> > > > > > achieve
> > > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Please find more details in the
> > > FLIP
> > > > wiki
> > > > > > > > > > document
> > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > > >
> > >
> >

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Xintong Song <to...@gmail.com>.
Thanks for updating the FLIP, Yangze.

I agree with Till that we probably want to separate the K8s/Yarn decorator
calls. Users can still configure one driver class, and we can use
`instanceof` to check whether the driver implemented K8s/Yarn specific
interfaces.

Moreover, I'm not sure about exposing entire `ContainerRequest` / `Pod`
(`AbstractKubernetesStepDecorator` directly manipulates on `Pod`) to user
codes. It gives more access to user codes than needed for defining external
resource, which might cause problems. Instead, I would suggest to have
interface like `Map<String key, String value>
getYarn/KubernetesExternalResource()` and assemble them into
`ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.

Thank you~

Xintong Song



On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <tr...@apache.org> wrote:

> Hi everyone,
>
> I'm a bit late to the party. I think the current proposal looks good.
>
> Concerning the ExternalResourceDriver interface defined in the FLIP [1], I
> would suggest to not include the decorator calls for Kubernetes and Yarn in
> the base interface. Instead I would suggest to segregate the deployment
> specific decorator calls into separate interfaces. That way an
> ExternalResourceDriver does not have to support all deployments from the
> very beginning. Moreover, some resources might not be supported by a
> specific deployment target and the natural way to express this would be to
> not implement the respective deployment specific interface.
>
> Moreover, having void
> addExternalResourceToRequest(AMRMClient.ContainerRequest containerRequest)
> in the ExternalResourceDriver interface would require Hadoop on Flink's
> classpath whenever the external resource driver is being used.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
>
> Cheers,
> Till
>
> On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <se...@apache.org> wrote:
>
> > Nice, thanks a lot!
> >
> > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Thanks for the suggestion, @Stephan, @Becket and @Xintong.
> > >
> > > I've updated the FLIP accordingly. I do not add a
> > > ResourceInfoProvider. Instead, I introduce the ExternalResourceDriver,
> > > which takes the responsibility of all relevant operations on both RM
> > > and TM sides.
> > > After a rethink about decoupling the management of external resources
> > > from TaskExecutor, I think we could do the same thing on the
> > > ResourceManager side. We do not need to add a specific allocation
> > > logic to the ResourceManager each time we add a specific external
> > > resource.
> > > - For Yarn, we need the ExternalResourceDriver to edit the
> > > containerRequest.
> > > - For Kubenetes, ExternalResourceDriver could provide a decorator for
> > > the TM pod.
> > >
> > > In this way, just like MetricReporter, we allow users to define their
> > > custom ExternalResourceDriver. It is more extensible and fits the
> > > separation of concerns. For more details, please take a look at [1].
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <se...@apache.org> wrote:
> > > >
> > > > This sounds good to go ahead from my side.
> > > >
> > > > I like the approach that Becket suggested - in that case the core
> > > > abstraction that everyone would need to understand would be "external
> > > > resource allocation" and the "ResourceInfoProvider", and the GPU
> > specific
> > > > code would be a specific implementation only known to that component
> > that
> > > > allocates the external resource. That fits the separation of concerns
> > > well.
> > > >
> > > > I also understand that it should not be over-engineered in the first
> > > > version, so some simplification makes sense, and then gradually
> expand
> > > from
> > > > there.
> > > >
> > > > So +1 to go ahead with what was suggested above (Xintong / Becket)
> from
> > > my
> > > > side.
> > > >
> > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks for the comments, Stephan & Becket.
> > > > >
> > > > > @Stephan
> > > > >
> > > > > I see your concern, and I completely agree with you that we should
> > > first
> > > > > think about the "library" / "plugin" / "extension" style if
> possible.
> > > > >
> > > > > If GPUs are sliced and assigned during scheduling, there may be
> > reason,
> > > > > > although it looks that it would belong to the slot then. Is that
> > > what we
> > > > > > are doing here?
> > > > >
> > > > >
> > > > > In the current proposal, we do not have the GPUs sliced and
> assigned
> > to
> > > > > slots, because it could be problematic without dynamic slot
> > allocation.
> > > > > E.g., the number of GPUs might not be evenly divisible by the
> number
> > of
> > > > > slots.
> > > > >
> > > > > I think it makes sense to eventually have the GPUs assigned to
> slots.
> > > Even
> > > > > then, we might still need a TM level GPUManager (or
> ResourceProvider
> > > like
> > > > > Becket suggested). For memory, in each slot we can simply request
> the
> > > > > amount of memory, leaving it to JVM / OS to decide which memory
> > > (address)
> > > > > should be assigned. For GPU, and potentially other resources like
> > > FPGA, we
> > > > > need to explicitly specify which GPU (index) should be used.
> > > Therefore, we
> > > > > need some component at the TM level to coordinate which slot uses
> > which
> > > > > GPU.
> > > > >
> > > > > IMO, unless we say Flink will not support slot-level GPU slicing at
> > > least
> > > > > in the foreseeable future, I don't see a good way to avoid touching
> > > the TM
> > > > > core. To that end, I think Becket's suggestion points to a good
> > > direction,
> > > > > that supports more features (GPU, FPGA, etc.) with less coupling to
> > > the TM
> > > > > core (only needs to understand the general interfaces). The
> detailed
> > > > > implementation for specific resource types can even be encapsulated
> > as
> > > a
> > > > > library.
> > > > >
> > > > > @Becket
> > > > >
> > > > > Thanks for sharing your thought on the final state. Despite the
> > > details how
> > > > > the interfaces should look like, I think this is a really good
> > > abstraction
> > > > > for supporting general resource types.
> > > > >
> > > > > I'd like to further clarify that, the following three things are
> all
> > > that
> > > > > the "Flink core" needs to understand.
> > > > >
> > > > >    - The *amount* of resource, for scheduling. Actually, we already
> > > have
> > > > >    the Resource class in ResourceProfile and ResourceSpec for
> > extended
> > > > >    resource. It's just not really used.
> > > > >    - The *info*, that Flink provides to the operators / user codes.
> > > > >    - The *provider*, which generates the info based on the amount.
> > > > >
> > > > > The "core" does not need to understand the specific implementation
> > > details
> > > > > of the above three. They can even be implemented in a 3rd-party
> > > library.
> > > > > Similar to how we allow users to define their custom
> MetricReporter.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thanks for the comment, Stephan.
> > > > > >
> > > > > >   - If everything becomes a "core feature", it will make the
> > project
> > > hard
> > > > > > > to develop in the future. Thinking "library" / "plugin" /
> > > "extension"
> > > > > > style
> > > > > > > where possible helps.
> > > > > >
> > > > > >
> > > > > > Completely agree. It is much more important to design a mechanism
> > > than
> > > > > > focusing on a specific case. Here is what I am thinking to fully
> > > support
> > > > > > custom resource management:
> > > > > > 1. On the JM / RM side, use ResourceProfile and ResourceSpec to
> > > define
> > > > > the
> > > > > > resource and the amount required. They will be used to find
> > suitable
> > > TMs
> > > > > > slots to run the tasks. At this point, the resources are only
> > > measured by
> > > > > > amount, i.e. they do not have individual ID.
> > > > > >
> > > > > > 2. On the TM side, have something like *"ResourceInfoProvider"*
> to
> > > > > identify
> > > > > > and provides the detail information of the individual resource,
> > e.g.
> > > GPU
> > > > > > ID.. It is important because the operator may have to explicitly
> > > interact
> > > > > > with the physical resource it uses. The ResourceInfoProvider
> might
> > > look
> > > > > > like something below.
> > > > > > interface ResourceInfoProvider<INFO> {
> > > > > >     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId opId,
> > > > > > ResourceProfile resourceProfile);
> > > > > > }
> > > > > >
> > > > > > - There could be several "*ResourceInfoProvider*" configured on
> the
> > > TM to
> > > > > > retrieve the information for different resources.
> > > > > > - The TM will be responsible to assign those individual resources
> > to
> > > each
> > > > > > operator according to their requested amount.
> > > > > > - The operators will be able to get the ResourceInfo from their
> > > > > > RuntimeContext.
> > > > > >
> > > > > > If we agree this is a reasonable final state. We can adapt the
> > > current
> > > > > FLIP
> > > > > > to it. In fact it does not sound a big change to me. All the
> > proposed
> > > > > > configuration can be as is, it is just that Flink itself won't
> care
> > > about
> > > > > > them, instead a GPUInfoProviver implementing the
> > ResourceInfoProvider
> > > > > will
> > > > > > use them.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Hi all!
> > > > > > >
> > > > > > > The main point I wanted to throw into the discussion is the
> > > following:
> > > > > > >   - With more and more use cases, more and more tools go into
> > Flink
> > > > > > >   - If everything becomes a "core feature", it will make the
> > > project
> > > > > hard
> > > > > > > to develop in the future. Thinking "library" / "plugin" /
> > > "extension"
> > > > > > style
> > > > > > > where possible helps.
> > > > > > >
> > > > > > >   - A good thought experiment is always: How many future
> > developers
> > > > > have
> > > > > > to
> > > > > > > interact with this code (and possibly understand it partially),
> > > even if
> > > > > > the
> > > > > > > features they touch have nothing to do with GPU support. If
> many
> > > > > > > contributors to unrelated features will have to touch it and
> > > understand
> > > > > > it,
> > > > > > > then let's think if there is a different solution. Maybe there
> is
> > > not,
> > > > > > but
> > > > > > > then we should be sure why.
> > > > > > >
> > > > > > >   - That led me to raising this issue: If the GPU manager
> > becomes a
> > > > > core
> > > > > > > service in the TaskManager, Environment, RuntimeContext, etc.
> > then
> > > > > > everyone
> > > > > > > developing TM and streaming tasks need to understand the GPU
> > > manager.
> > > > > > That
> > > > > > > seems oddly specific, is my impression.
> > > > > > >
> > > > > > > Access to configuration seems not the right reason to do that.
> We
> > > > > should
> > > > > > > expose the Flink configuration from the RuntimeContext anyways.
> > > > > > >
> > > > > > > If GPUs are sliced and assigned during scheduling, there may be
> > > reason,
> > > > > > > although it looks that it would belong to the slot then. Is
> that
> > > what
> > > > > we
> > > > > > > are doing here?
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> > > tonysong820@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > >
> > > > > > > > IMO, eventually an operator should only see info of GPUs that
> > are
> > > > > > > dedicated
> > > > > > > > for it, instead of all GPUs on the machine/container in the
> > > current
> > > > > > > design.
> > > > > > > > It does not make sense to let the user who writes a UDF to
> > worry
> > > > > about
> > > > > > > > coordination among multiple operators running on the same
> > > machine.
> > > > > And
> > > > > > if
> > > > > > > > we want to limit the GPU info an operator sees, we should not
> > > let the
> > > > > > > > operator to instantiate GPUManager, which means we have to
> > expose
> > > > > > > something
> > > > > > > > through runtime context, either GPU info or some kind of
> > limited
> > > > > access
> > > > > > > to
> > > > > > > > the GPUManager.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <
> > becket.qin@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > It probably make sense for us to first agree on the final
> > > state.
> > > > > More
> > > > > > > > > specifically, will the resource info be exposed through
> > runtime
> > > > > > context
> > > > > > > > > eventually?
> > > > > > > > >
> > > > > > > > > If that is the final state and we have a seamless migration
> > > story
> > > > > > from
> > > > > > > > this
> > > > > > > > > FLIP to that final state, Personally I think it is OK to
> > > expose the
> > > > > > GPU
> > > > > > > > > info in the runtime context.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > >
> > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> > > > > tonysong820@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > @Yangze,
> > > > > > > > > > I think what Stephan means (@Stephan, please correct me
> if
> > > I'm
> > > > > > wrong)
> > > > > > > > is
> > > > > > > > > > that, we might not need to hold and maintain the
> GPUManager
> > > as a
> > > > > > > > service
> > > > > > > > > in
> > > > > > > > > > TaskManagerServices or RuntimeContext. An alternative is
> to
> > > > > create
> > > > > > /
> > > > > > > > > > retrieve the GPUManager only in the operators that need
> it,
> > > e.g.,
> > > > > > > with
> > > > > > > > a
> > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > >
> > > > > > > > > > @Stephan,
> > > > > > > > > > I agree with you on excluding GPUManager from
> > > > > TaskManagerServices.
> > > > > > > > > >
> > > > > > > > > >    - For the first step, where we provide unified
> TM-level
> > > GPU
> > > > > > > > > information
> > > > > > > > > >    to all operators, it should be fine to have operators
> > > access /
> > > > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > > > >    - In future, we might have some more fine-grained GPU
> > > > > > management,
> > > > > > > > > where
> > > > > > > > > >    we need to maintain GPUManager as a service and put
> GPU
> > > info
> > > > > in
> > > > > > > slot
> > > > > > > > > >    profiles. But at least for now it's not necessary to
> > > introduce
> > > > > > > such
> > > > > > > > > >    complexity.
> > > > > > > > > >
> > > > > > > > > > However, I have some concerns on excluding GPUManager
> from
> > > > > > > > RuntimeContext
> > > > > > > > > > and let operators access it directly.
> > > > > > > > > >
> > > > > > > > > >    - Configurations needed for creating the GPUManager is
> > not
> > > > > > always
> > > > > > > > > >    available for operators.
> > > > > > > > > >    - If later we want to have fine-grained control over
> GPU
> > > > > (e.g.,
> > > > > > > > > >    operators in each slot can only see GPUs reserved for
> > that
> > > > > > slot),
> > > > > > > > the
> > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > >
> > > > > > > > > > I would suggest to wrap the GPUManager behind
> > RuntimeContext
> > > and
> > > > > > only
> > > > > > > > > > expose the GPUInfo to users. For now, we can declare a
> > method
> > > > > > > > > > `getGPUInfo()` in RuntimeContext, with a default
> definition
> > > that
> > > > > > > calls
> > > > > > > > > > `GPUManager.get()` to get the lazily-created GPUManager.
> If
> > > later
> > > > > > we
> > > > > > > > want
> > > > > > > > > > to create / retrieve GPUManager in a different way, we
> can
> > > simply
> > > > > > > > change
> > > > > > > > > > how `getGPUInfo` is implemented, without needing to
> change
> > > any
> > > > > > public
> > > > > > > > > > interfaces.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <
> > > karmagyz@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > @Shephan
> > > > > > > > > > > Do you mean Minicluster? Yes, it makes sense to share
> the
> > > GPU
> > > > > > > Manager
> > > > > > > > > > > in such scenario.
> > > > > > > > > > > If that's what you worry about, I'm +1 for holding
> > > > > > > > > > > GPUManager(ExternalResourceManagers) in TaskExecutor
> > > instead of
> > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > >
> > > > > > > > > > > Regarding the RuntimeContext/FunctionContext, it just
> > > holds the
> > > > > > GPU
> > > > > > > > > > > info instead of the GPU Manager. AFAIK, it's the only
> > > place we
> > > > > > > could
> > > > > > > > > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Yangze Guo
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> > > > > > > isaac@paddlesoft.net
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000
> > sewen@apache.org
> > > > > wrote
> > > > > > > > ----
> > > > > > > > > > > >
> > > > > > > > > > > > > > Can we somehow keep this out of the TaskManager
> > > services
> > > > > > > > > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > > > > > > > > ExternalServicesManagers in future) is conceptually
> > > one of
> > > > > > the
> > > > > > > > task
> > > > > > > > > > > > > manager services, just like MemoryManager before
> > 1.10.
> > > > > > > > > > > > > - It maintains/holds the GPU resource at TM level
> and
> > > all
> > > > > of
> > > > > > > the
> > > > > > > > > > > > > operators allocate the GPU resources from it. So,
> it
> > > should
> > > > > > be
> > > > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > > > - We could add a collection called
> > > ExternalResourceManagers
> > > > > > to
> > > > > > > > hold
> > > > > > > > > > > > > all managers of other external resources in the
> > future.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Can you help me understand why this needs the
> addition
> > in
> > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > Are you worried about the case when multiple Task
> > > Executors
> > > > > run
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > same
> > > > > > > > > > > > JVM? That's not common, but wouldn't it actually be
> > good
> > > in
> > > > > > that
> > > > > > > > case
> > > > > > > > > > to
> > > > > > > > > > > > share the GPU Manager, given that the GPU is shared?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Stephan
> > > > > > > > > > > >
> > > > > > > > > > > > ---------------------------
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > What parts need information about this?
> > > > > > > > > > > > > In this FLIP, operators need the information. Thus,
> > we
> > > > > expose
> > > > > > > GPU
> > > > > > > > > > > > > information to the RuntimeContext/FunctionContext.
> > The
> > > slot
> > > > > > > > profile
> > > > > > > > > > is
> > > > > > > > > > > > > not aware of GPU resources as GPU is TM level
> > resource
> > > now.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Can the GPU Manager be a "self contained" thing
> > that
> > > > > simply
> > > > > > > > takes
> > > > > > > > > > the
> > > > > > > > > > > > > configuration, and then abstracts everything
> > > internally?
> > > > > > > > > > > > > Yes, we just pass the path/args of the discover
> > script
> > > and
> > > > > > how
> > > > > > > > many
> > > > > > > > > > > > > GPUs per TM to it. It takes the responsibility to
> get
> > > the
> > > > > GPU
> > > > > > > > > > > > > information and expose them to the
> > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > of
> > > > > > > > > > > > > Operators. Meanwhile, we'd better not allow
> operators
> > > to
> > > > > > > directly
> > > > > > > > > > > > > access GPUManager, it should get what they want
> from
> > > > > Context.
> > > > > > > We
> > > > > > > > > > could
> > > > > > > > > > > > > then decouple the interface/implementation of
> > > GPUManager
> > > > > and
> > > > > > > > Public
> > > > > > > > > > > > > API.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <
> > > > > > sewen@apache.org
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It sounds fine to initially start with GPU
> specific
> > > > > support
> > > > > > > and
> > > > > > > > > > think
> > > > > > > > > > > > > about
> > > > > > > > > > > > > > generalizing this once we better understand the
> > > space.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > > > > > > > > - Can we somehow keep this out of the TaskManager
> > > > > services?
> > > > > > > > > > Anything
> > > > > > > > > > > we
> > > > > > > > > > > > > > have to pull through all layers of the TM makes
> the
> > > TM
> > > > > > > > components
> > > > > > > > > > yet
> > > > > > > > > > > > > more
> > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - What parts need information about this?
> > > > > > > > > > > > > > -> do the slot profiles need information about
> the
> > > GPU?
> > > > > > > > > > > > > > -> Can the GPU Manager be a "self contained"
> thing
> > > that
> > > > > > > simply
> > > > > > > > > > takes
> > > > > > > > > > > > > > the configuration, and then abstracts everything
> > > > > > internally?
> > > > > > > > > > > Operators
> > > > > > > > > > > > > can
> > > > > > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> > > > > > > karmagyz@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > Regarding the WebUI and GPUInfo, you're right,
> > > I'll add
> > > > > > > them
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > Regarding the general extended resource
> > mechanism,
> > > I
> > > > > > second
> > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > - It's better to leverage ResourceProfile and
> > > > > > ResourceSpec
> > > > > > > > > after
> > > > > > > > > > we
> > > > > > > > > > > > > > > supporting fine-grained GPU scheduling. As a
> > first
> > > step
> > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > prefer to not include it in the scope of this
> > FLIP.
> > > > > > > > > > > > > > > - Regarding the "Extended Resource Manager",
> if I
> > > > > > > understand
> > > > > > > > > > > > > > > correctly, it just a code refactoring atm, we
> > could
> > > > > > extract
> > > > > > > > the
> > > > > > > > > > > > > > > open/close/allocateExtendResources of
> GPUManager
> > to
> > > > > that
> > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > that is the case, +1 to do it during
> > > implementation.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > As Xintong said, we looked into how Spark
> > supports
> > > a
> > > > > > > general
> > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > Resource Scheduling" before and decided to
> > > introduce a
> > > > > > > common
> > > > > > > > > > > resource
> > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > to make it more extensible. I think the
> > "resource"
> > > is a
> > > > > > > > proper
> > > > > > > > > > > level
> > > > > > > > > > > > > > > to contain all the configs of extended
> resources.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > There is no doubt that GPU resource
> management
> > > > > support
> > > > > > > will
> > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > facilitate the development of AI-related
> > > applications
> > > > > > by
> > > > > > > > > > PyFlink
> > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Regarding the names of several GPU
> > > configurations, I
> > > > > > > think
> > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > > > better
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > delete the resource field makes it consistent
> > > with
> > > > > the
> > > > > > > > names
> > > > > > > > > of
> > > > > > > > > > > other
> > > > > > > > > > > > > > > > resource-related configurations in
> > > TaskManagerOption.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > e.g.
> > > taskmanager.resource.gpu.discovery-script.path
> > > > > ->
> > > > > > > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song <to...@gmail.com>
> > > 于2020年3月4日周三
> > > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Actually, Yangze, Yang and I also had an
> > > offline
> > > > > > > > discussion
> > > > > > > > > > > about
> > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > the "GPU Support" as some general "Extended
> > > > > Resource
> > > > > > > > > > Support".
> > > > > > > > > > > We
> > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > supporting extended resources in a general
> > > > > mechanism
> > > > > > is
> > > > > > > > > > > definitely
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > and extensible way. The reason we propose
> > this
> > > FLIP
> > > > > > > > > narrowing
> > > > > > > > > > > its
> > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > down to GPU alone, is mainly for the
> concern
> > on
> > > > > extra
> > > > > > > > > efforts
> > > > > > > > > > > and
> > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > To come up with a well design on a general
> > > extended
> > > > > > > > > resource
> > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > mechanism, we would need to investigate
> more
> > > on how
> > > > > > > > people
> > > > > > > > > > use
> > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > kind of resources in practice. For GPU, we
> > > learnt
> > > > > > such
> > > > > > > > > > > knowledge
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > experts, Becket and his team members. But
> for
> > > FPGA,
> > > > > > or
> > > > > > > > > other
> > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > extended resources, we don't have such
> > > convenient
> > > > > > > > > information
> > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > making the investigation requires more
> > efforts,
> > > > > > which I
> > > > > > > > > tend
> > > > > > > > > > to
> > > > > > > > > > > > > think
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On the other hand, we also looked into how
> > > Spark
> > > > > > > > supports a
> > > > > > > > > > > general
> > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > Resource Scheduling". Assuming we want to
> > have
> > > a
> > > > > > > similar
> > > > > > > > > > > general
> > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > resource mechanism in the future, we
> believe
> > > that
> > > > > the
> > > > > > > > > current
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > design can be easily extended, in an
> > > incremental
> > > > > way
> > > > > > > > > without
> > > > > > > > > > > too
> > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - The most important part is probably user
> > > > > > interfaces.
> > > > > > > > > Spark
> > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > configuration options to define the amount,
> > > > > discovery
> > > > > > > > > script
> > > > > > > > > > > and
> > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > k8s) in a per resource type bias [1], which
> > is
> > > very
> > > > > > > > similar
> > > > > > > > > > to
> > > > > > > > > > > > > what
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > proposed in this FLIP. I think it's not
> > > necessary
> > > > > to
> > > > > > > > expose
> > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > in the general way atm, since we do not
> have
> > > > > supports
> > > > > > > for
> > > > > > > > > > other
> > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > types now. If later we decided to have per
> > > resource
> > > > > > > type
> > > > > > > > > > config
> > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > can have backwards compatibility on the
> > current
> > > > > > > proposed
> > > > > > > > > > > options
> > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > - For the GPU Manager, if later needed we
> can
> > > > > change
> > > > > > it
> > > > > > > > to
> > > > > > > > > a
> > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > Resource Manager" (or whatever it is
> called).
> > > That
> > > > > > > should
> > > > > > > > > be
> > > > > > > > > > a
> > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > > > > > - For ResourceProfile and ResourceSpec,
> there
> > > are
> > > > > > > already
> > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > general extended resource. We can of course
> > > > > leverage
> > > > > > > them
> > > > > > > > > > when
> > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > fine grained GPU scheduling. That is also
> not
> > > in
> > > > > the
> > > > > > > > scope
> > > > > > > > > of
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > step proposal, and would require FLIP-56 to
> > be
> > > > > > finished
> > > > > > > > > > first.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > To summary up, I agree with Becket that
> have
> > a
> > > > > > separate
> > > > > > > > > FLIP
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > general extended resource mechanism, and
> keep
> > > it in
> > > > > > > mind
> > > > > > > > > when
> > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > That's a good point, Stephan. It makes
> > total
> > > > > sense
> > > > > > to
> > > > > > > > > > > generalize
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > resource management to support custom
> > > resources.
> > > > > > > Having
> > > > > > > > > > that
> > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > to add new resources by themselves. The
> > > general
> > > > > > > > resource
> > > > > > > > > > > > > management
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 1. The custom resource type definition.
> It
> > is
> > > > > > > supported
> > > > > > > > > by
> > > > > > > > > > > the
> > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > resources in ResourceProfile and
> > > ResourceSpec.
> > > > > This
> > > > > > > > will
> > > > > > > > > > > likely
> > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 2. The custom resource allocation logic,
> > > i.e. how
> > > > > > to
> > > > > > > > > assign
> > > > > > > > > > > the
> > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > to different tasks, operators, and so on.
> > > This
> > > > > may
> > > > > > > > > require
> > > > > > > > > > > two
> > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > a. Subtask level - make sure the subtasks
> > > are put
> > > > > > > into
> > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > It is done by the global RM and is not
> > > > > customizable
> > > > > > > > right
> > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > b. Operator level - map the exact
> resource
> > > to the
> > > > > > > > > operators
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator
> B.
> > > This
> > > > > > step
> > > > > > > > is
> > > > > > > > > > > needed
> > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > the global RM does not distinguish
> > individual
> > > > > > > resources
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > same
> > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The GPU manager is designed to do 2.b
> here.
> > > So it
> > > > > > > > should
> > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > physical GPU information and bind/match
> > them
> > > to
> > > > > > each
> > > > > > > > > > > operators.
> > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > general will fill in the missing piece to
> > > support
> > > > > > > > custom
> > > > > > > > > > > resource
> > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > definition. But I'd avoid calling it a
> > > "External
> > > > > > > > Resource
> > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > confusion with RM, maybe something like
> > > "Operator
> > > > > > > > > Resource
> > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > be more accurate. So for each resource
> type
> > > users
> > > > > > can
> > > > > > > > > have
> > > > > > > > > > an
> > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > "Operator Resource Assigner" in the TM.
> For
> > > > > memory,
> > > > > > > > users
> > > > > > > > > > > don't
> > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > but for other extended resources, users
> may
> > > need
> > > > > > > that.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Personally I think a pluggable "Operator
> > > Resource
> > > > > > > > > Assigner"
> > > > > > > > > > > is
> > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > in this FLIP. But I am also OK with
> having
> > > that
> > > > > in
> > > > > > a
> > > > > > > > > > separate
> > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > the interface between the "Operator
> > Resource
> > > > > > > Assigner"
> > > > > > > > > and
> > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > take a while to settle down if we want to
> > > make it
> > > > > > > > > generic.
> > > > > > > > > > > But I
> > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > implementation should take this future
> work
> > > into
> > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > don't need to break backwards
> compatibility
> > > once
> > > > > we
> > > > > > > > have
> > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan
> > Ewen
> > > <
> > > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I cannot really give much input into
> the
> > > > > > mechanics
> > > > > > > of
> > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > and GPU allocation, as I have no
> > experience
> > > > > with
> > > > > > > > that.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > One thought I had when reading the
> > > proposal is
> > > > > if
> > > > > > > it
> > > > > > > > > > makes
> > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > the "GPU Manager" as an "External
> > Resource
> > > > > > > Manager",
> > > > > > > > > and
> > > > > > > > > > > GPU
> > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > The way I understand the
> ResourceProfile
> > > and
> > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > It has the advantage that it looks more
> > > > > > extensible.
> > > > > > > > > Maybe
> > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > Resource, a specialized NVIDIA GPU
> > > Resource,
> > > > > and
> > > > > > > FPGA
> > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket
> > Qin <
> > > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU
> > resource
> > > > > > > management
> > > > > > > > > > > support
> > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > for machine learning use cases.
> > Actually
> > > it
> > > > > is
> > > > > > > one
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > question from the users who are
> > > interested in
> > > > > > > using
> > > > > > > > > > Flink
> > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Some quick comments / questions to
> the
> > > wiki.
> > > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API should
> probably
> > > also
> > > > > be
> > > > > > > > > > > mentioned in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > 2. Is the data structure that holds
> GPU
> > > info
> > > > > > > also a
> > > > > > > > > > > public
> > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM
> Xintong
> > > Song
> > > > > <
> > > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for drafting the FLIP and
> > > kicking
> > > > > off
> > > > > > > the
> > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Big +1 for this feature. Supporting
> > > using
> > > > > of
> > > > > > > GPU
> > > > > > > > in
> > > > > > > > > > > Flink
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and
> > it
> > > > > looks
> > > > > > > good
> > > > > > > > > to
> > > > > > > > > > > me. I
> > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > very good first step for Flink's
> GPU
> > > > > > supports.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM
> > Yangze
> > > Guo
> > > > > <
> > > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > We would like to start a
> discussion
> > > > > thread
> > > > > > on
> > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > This FLIP mainly discusses the
> > > following
> > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > - Enable user to configure how
> many
> > > GPUs
> > > > > > in a
> > > > > > > > > task
> > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > forward such requirements to the
> > > external
> > > > > > > > > resource
> > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > > > > > > > > - Provide information of
> available
> > > GPU
> > > > > > > > resources
> > > > > > > > > to
> > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Key changes proposed in the FLIP
> > are
> > > as
> > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > - Forward GPU resource
> requirements
> > > to
> > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > - Introduce GPUManager as one of
> > the
> > > task
> > > > > > > > manager
> > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > and expose GPU resource
> information
> > > to
> > > > > the
> > > > > > > > > context
> > > > > > > > > > of
> > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > - Introduce the default script
> for
> > > GPU
> > > > > > > > discovery,
> > > > > > > > > > in
> > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > the privilege mode to help user
> to
> > > > > achieve
> > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Please find more details in the
> > FLIP
> > > wiki
> > > > > > > > > document
> > > > > > > > > > > [1].
> > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Till Rohrmann <tr...@apache.org>.
Hi everyone,

I'm a bit late to the party. I think the current proposal looks good.

Concerning the ExternalResourceDriver interface defined in the FLIP [1], I
would suggest to not include the decorator calls for Kubernetes and Yarn in
the base interface. Instead I would suggest to segregate the deployment
specific decorator calls into separate interfaces. That way an
ExternalResourceDriver does not have to support all deployments from the
very beginning. Moreover, some resources might not be supported by a
specific deployment target and the natural way to express this would be to
not implement the respective deployment specific interface.

Moreover, having void
addExternalResourceToRequest(AMRMClient.ContainerRequest containerRequest)
in the ExternalResourceDriver interface would require Hadoop on Flink's
classpath whenever the external resource driver is being used.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink

Cheers,
Till

On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <se...@apache.org> wrote:

> Nice, thanks a lot!
>
> On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <ka...@gmail.com> wrote:
>
> > Thanks for the suggestion, @Stephan, @Becket and @Xintong.
> >
> > I've updated the FLIP accordingly. I do not add a
> > ResourceInfoProvider. Instead, I introduce the ExternalResourceDriver,
> > which takes the responsibility of all relevant operations on both RM
> > and TM sides.
> > After a rethink about decoupling the management of external resources
> > from TaskExecutor, I think we could do the same thing on the
> > ResourceManager side. We do not need to add a specific allocation
> > logic to the ResourceManager each time we add a specific external
> > resource.
> > - For Yarn, we need the ExternalResourceDriver to edit the
> > containerRequest.
> > - For Kubenetes, ExternalResourceDriver could provide a decorator for
> > the TM pod.
> >
> > In this way, just like MetricReporter, we allow users to define their
> > custom ExternalResourceDriver. It is more extensible and fits the
> > separation of concerns. For more details, please take a look at [1].
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> >
> > Best,
> > Yangze Guo
> >
> > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <se...@apache.org> wrote:
> > >
> > > This sounds good to go ahead from my side.
> > >
> > > I like the approach that Becket suggested - in that case the core
> > > abstraction that everyone would need to understand would be "external
> > > resource allocation" and the "ResourceInfoProvider", and the GPU
> specific
> > > code would be a specific implementation only known to that component
> that
> > > allocates the external resource. That fits the separation of concerns
> > well.
> > >
> > > I also understand that it should not be over-engineered in the first
> > > version, so some simplification makes sense, and then gradually expand
> > from
> > > there.
> > >
> > > So +1 to go ahead with what was suggested above (Xintong / Becket) from
> > my
> > > side.
> > >
> > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <to...@gmail.com>
> > wrote:
> > >
> > > > Thanks for the comments, Stephan & Becket.
> > > >
> > > > @Stephan
> > > >
> > > > I see your concern, and I completely agree with you that we should
> > first
> > > > think about the "library" / "plugin" / "extension" style if possible.
> > > >
> > > > If GPUs are sliced and assigned during scheduling, there may be
> reason,
> > > > > although it looks that it would belong to the slot then. Is that
> > what we
> > > > > are doing here?
> > > >
> > > >
> > > > In the current proposal, we do not have the GPUs sliced and assigned
> to
> > > > slots, because it could be problematic without dynamic slot
> allocation.
> > > > E.g., the number of GPUs might not be evenly divisible by the number
> of
> > > > slots.
> > > >
> > > > I think it makes sense to eventually have the GPUs assigned to slots.
> > Even
> > > > then, we might still need a TM level GPUManager (or ResourceProvider
> > like
> > > > Becket suggested). For memory, in each slot we can simply request the
> > > > amount of memory, leaving it to JVM / OS to decide which memory
> > (address)
> > > > should be assigned. For GPU, and potentially other resources like
> > FPGA, we
> > > > need to explicitly specify which GPU (index) should be used.
> > Therefore, we
> > > > need some component at the TM level to coordinate which slot uses
> which
> > > > GPU.
> > > >
> > > > IMO, unless we say Flink will not support slot-level GPU slicing at
> > least
> > > > in the foreseeable future, I don't see a good way to avoid touching
> > the TM
> > > > core. To that end, I think Becket's suggestion points to a good
> > direction,
> > > > that supports more features (GPU, FPGA, etc.) with less coupling to
> > the TM
> > > > core (only needs to understand the general interfaces). The detailed
> > > > implementation for specific resource types can even be encapsulated
> as
> > a
> > > > library.
> > > >
> > > > @Becket
> > > >
> > > > Thanks for sharing your thought on the final state. Despite the
> > details how
> > > > the interfaces should look like, I think this is a really good
> > abstraction
> > > > for supporting general resource types.
> > > >
> > > > I'd like to further clarify that, the following three things are all
> > that
> > > > the "Flink core" needs to understand.
> > > >
> > > >    - The *amount* of resource, for scheduling. Actually, we already
> > have
> > > >    the Resource class in ResourceProfile and ResourceSpec for
> extended
> > > >    resource. It's just not really used.
> > > >    - The *info*, that Flink provides to the operators / user codes.
> > > >    - The *provider*, which generates the info based on the amount.
> > > >
> > > > The "core" does not need to understand the specific implementation
> > details
> > > > of the above three. They can even be implemented in a 3rd-party
> > library.
> > > > Similar to how we allow users to define their custom MetricReporter.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks for the comment, Stephan.
> > > > >
> > > > >   - If everything becomes a "core feature", it will make the
> project
> > hard
> > > > > > to develop in the future. Thinking "library" / "plugin" /
> > "extension"
> > > > > style
> > > > > > where possible helps.
> > > > >
> > > > >
> > > > > Completely agree. It is much more important to design a mechanism
> > than
> > > > > focusing on a specific case. Here is what I am thinking to fully
> > support
> > > > > custom resource management:
> > > > > 1. On the JM / RM side, use ResourceProfile and ResourceSpec to
> > define
> > > > the
> > > > > resource and the amount required. They will be used to find
> suitable
> > TMs
> > > > > slots to run the tasks. At this point, the resources are only
> > measured by
> > > > > amount, i.e. they do not have individual ID.
> > > > >
> > > > > 2. On the TM side, have something like *"ResourceInfoProvider"* to
> > > > identify
> > > > > and provides the detail information of the individual resource,
> e.g.
> > GPU
> > > > > ID.. It is important because the operator may have to explicitly
> > interact
> > > > > with the physical resource it uses. The ResourceInfoProvider might
> > look
> > > > > like something below.
> > > > > interface ResourceInfoProvider<INFO> {
> > > > >     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId opId,
> > > > > ResourceProfile resourceProfile);
> > > > > }
> > > > >
> > > > > - There could be several "*ResourceInfoProvider*" configured on the
> > TM to
> > > > > retrieve the information for different resources.
> > > > > - The TM will be responsible to assign those individual resources
> to
> > each
> > > > > operator according to their requested amount.
> > > > > - The operators will be able to get the ResourceInfo from their
> > > > > RuntimeContext.
> > > > >
> > > > > If we agree this is a reasonable final state. We can adapt the
> > current
> > > > FLIP
> > > > > to it. In fact it does not sound a big change to me. All the
> proposed
> > > > > configuration can be as is, it is just that Flink itself won't care
> > about
> > > > > them, instead a GPUInfoProviver implementing the
> ResourceInfoProvider
> > > > will
> > > > > use them.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <se...@apache.org>
> > wrote:
> > > > >
> > > > > > Hi all!
> > > > > >
> > > > > > The main point I wanted to throw into the discussion is the
> > following:
> > > > > >   - With more and more use cases, more and more tools go into
> Flink
> > > > > >   - If everything becomes a "core feature", it will make the
> > project
> > > > hard
> > > > > > to develop in the future. Thinking "library" / "plugin" /
> > "extension"
> > > > > style
> > > > > > where possible helps.
> > > > > >
> > > > > >   - A good thought experiment is always: How many future
> developers
> > > > have
> > > > > to
> > > > > > interact with this code (and possibly understand it partially),
> > even if
> > > > > the
> > > > > > features they touch have nothing to do with GPU support. If many
> > > > > > contributors to unrelated features will have to touch it and
> > understand
> > > > > it,
> > > > > > then let's think if there is a different solution. Maybe there is
> > not,
> > > > > but
> > > > > > then we should be sure why.
> > > > > >
> > > > > >   - That led me to raising this issue: If the GPU manager
> becomes a
> > > > core
> > > > > > service in the TaskManager, Environment, RuntimeContext, etc.
> then
> > > > > everyone
> > > > > > developing TM and streaming tasks need to understand the GPU
> > manager.
> > > > > That
> > > > > > seems oddly specific, is my impression.
> > > > > >
> > > > > > Access to configuration seems not the right reason to do that. We
> > > > should
> > > > > > expose the Flink configuration from the RuntimeContext anyways.
> > > > > >
> > > > > > If GPUs are sliced and assigned during scheduling, there may be
> > reason,
> > > > > > although it looks that it would belong to the slot then. Is that
> > what
> > > > we
> > > > > > are doing here?
> > > > > >
> > > > > > Best,
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> > tonysong820@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > >  Thanks for the feedback, Becket.
> > > > > > >
> > > > > > > IMO, eventually an operator should only see info of GPUs that
> are
> > > > > > dedicated
> > > > > > > for it, instead of all GPUs on the machine/container in the
> > current
> > > > > > design.
> > > > > > > It does not make sense to let the user who writes a UDF to
> worry
> > > > about
> > > > > > > coordination among multiple operators running on the same
> > machine.
> > > > And
> > > > > if
> > > > > > > we want to limit the GPU info an operator sees, we should not
> > let the
> > > > > > > operator to instantiate GPUManager, which means we have to
> expose
> > > > > > something
> > > > > > > through runtime context, either GPU info or some kind of
> limited
> > > > access
> > > > > > to
> > > > > > > the GPUManager.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <
> becket.qin@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > It probably make sense for us to first agree on the final
> > state.
> > > > More
> > > > > > > > specifically, will the resource info be exposed through
> runtime
> > > > > context
> > > > > > > > eventually?
> > > > > > > >
> > > > > > > > If that is the final state and we have a seamless migration
> > story
> > > > > from
> > > > > > > this
> > > > > > > > FLIP to that final state, Personally I think it is OK to
> > expose the
> > > > > GPU
> > > > > > > > info in the runtime context.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> > > > tonysong820@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > @Yangze,
> > > > > > > > > I think what Stephan means (@Stephan, please correct me if
> > I'm
> > > > > wrong)
> > > > > > > is
> > > > > > > > > that, we might not need to hold and maintain the GPUManager
> > as a
> > > > > > > service
> > > > > > > > in
> > > > > > > > > TaskManagerServices or RuntimeContext. An alternative is to
> > > > create
> > > > > /
> > > > > > > > > retrieve the GPUManager only in the operators that need it,
> > e.g.,
> > > > > > with
> > > > > > > a
> > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > >
> > > > > > > > > @Stephan,
> > > > > > > > > I agree with you on excluding GPUManager from
> > > > TaskManagerServices.
> > > > > > > > >
> > > > > > > > >    - For the first step, where we provide unified TM-level
> > GPU
> > > > > > > > information
> > > > > > > > >    to all operators, it should be fine to have operators
> > access /
> > > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > > >    - In future, we might have some more fine-grained GPU
> > > > > management,
> > > > > > > > where
> > > > > > > > >    we need to maintain GPUManager as a service and put GPU
> > info
> > > > in
> > > > > > slot
> > > > > > > > >    profiles. But at least for now it's not necessary to
> > introduce
> > > > > > such
> > > > > > > > >    complexity.
> > > > > > > > >
> > > > > > > > > However, I have some concerns on excluding GPUManager from
> > > > > > > RuntimeContext
> > > > > > > > > and let operators access it directly.
> > > > > > > > >
> > > > > > > > >    - Configurations needed for creating the GPUManager is
> not
> > > > > always
> > > > > > > > >    available for operators.
> > > > > > > > >    - If later we want to have fine-grained control over GPU
> > > > (e.g.,
> > > > > > > > >    operators in each slot can only see GPUs reserved for
> that
> > > > > slot),
> > > > > > > the
> > > > > > > > >    approach cannot be easily extended.
> > > > > > > > >
> > > > > > > > > I would suggest to wrap the GPUManager behind
> RuntimeContext
> > and
> > > > > only
> > > > > > > > > expose the GPUInfo to users. For now, we can declare a
> method
> > > > > > > > > `getGPUInfo()` in RuntimeContext, with a default definition
> > that
> > > > > > calls
> > > > > > > > > `GPUManager.get()` to get the lazily-created GPUManager. If
> > later
> > > > > we
> > > > > > > want
> > > > > > > > > to create / retrieve GPUManager in a different way, we can
> > simply
> > > > > > > change
> > > > > > > > > how `getGPUInfo` is implemented, without needing to change
> > any
> > > > > public
> > > > > > > > > interfaces.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <
> > karmagyz@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > @Shephan
> > > > > > > > > > Do you mean Minicluster? Yes, it makes sense to share the
> > GPU
> > > > > > Manager
> > > > > > > > > > in such scenario.
> > > > > > > > > > If that's what you worry about, I'm +1 for holding
> > > > > > > > > > GPUManager(ExternalResourceManagers) in TaskExecutor
> > instead of
> > > > > > > > > > TaskManagerServices.
> > > > > > > > > >
> > > > > > > > > > Regarding the RuntimeContext/FunctionContext, it just
> > holds the
> > > > > GPU
> > > > > > > > > > info instead of the GPU Manager. AFAIK, it's the only
> > place we
> > > > > > could
> > > > > > > > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> > > > > > isaac@paddlesoft.net
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000
> sewen@apache.org
> > > > wrote
> > > > > > > ----
> > > > > > > > > > >
> > > > > > > > > > > > > Can we somehow keep this out of the TaskManager
> > services
> > > > > > > > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > > > > > > > ExternalServicesManagers in future) is conceptually
> > one of
> > > > > the
> > > > > > > task
> > > > > > > > > > > > manager services, just like MemoryManager before
> 1.10.
> > > > > > > > > > > > - It maintains/holds the GPU resource at TM level and
> > all
> > > > of
> > > > > > the
> > > > > > > > > > > > operators allocate the GPU resources from it. So, it
> > should
> > > > > be
> > > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > > - We could add a collection called
> > ExternalResourceManagers
> > > > > to
> > > > > > > hold
> > > > > > > > > > > > all managers of other external resources in the
> future.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Can you help me understand why this needs the addition
> in
> > > > > > > > > > TaskMagerServices
> > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > Are you worried about the case when multiple Task
> > Executors
> > > > run
> > > > > > in
> > > > > > > > the
> > > > > > > > > > same
> > > > > > > > > > > JVM? That's not common, but wouldn't it actually be
> good
> > in
> > > > > that
> > > > > > > case
> > > > > > > > > to
> > > > > > > > > > > share the GPU Manager, given that the GPU is shared?
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Stephan
> > > > > > > > > > >
> > > > > > > > > > > ---------------------------
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > What parts need information about this?
> > > > > > > > > > > > In this FLIP, operators need the information. Thus,
> we
> > > > expose
> > > > > > GPU
> > > > > > > > > > > > information to the RuntimeContext/FunctionContext.
> The
> > slot
> > > > > > > profile
> > > > > > > > > is
> > > > > > > > > > > > not aware of GPU resources as GPU is TM level
> resource
> > now.
> > > > > > > > > > > >
> > > > > > > > > > > > > Can the GPU Manager be a "self contained" thing
> that
> > > > simply
> > > > > > > takes
> > > > > > > > > the
> > > > > > > > > > > > configuration, and then abstracts everything
> > internally?
> > > > > > > > > > > > Yes, we just pass the path/args of the discover
> script
> > and
> > > > > how
> > > > > > > many
> > > > > > > > > > > > GPUs per TM to it. It takes the responsibility to get
> > the
> > > > GPU
> > > > > > > > > > > > information and expose them to the
> > > > > > RuntimeContext/FunctionContext
> > > > > > > > of
> > > > > > > > > > > > Operators. Meanwhile, we'd better not allow operators
> > to
> > > > > > directly
> > > > > > > > > > > > access GPUManager, it should get what they want from
> > > > Context.
> > > > > > We
> > > > > > > > > could
> > > > > > > > > > > > then decouple the interface/implementation of
> > GPUManager
> > > > and
> > > > > > > Public
> > > > > > > > > > > > API.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <
> > > > > sewen@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > It sounds fine to initially start with GPU specific
> > > > support
> > > > > > and
> > > > > > > > > think
> > > > > > > > > > > > about
> > > > > > > > > > > > > generalizing this once we better understand the
> > space.
> > > > > > > > > > > > >
> > > > > > > > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > > > > > > > - Can we somehow keep this out of the TaskManager
> > > > services?
> > > > > > > > > Anything
> > > > > > > > > > we
> > > > > > > > > > > > > have to pull through all layers of the TM makes the
> > TM
> > > > > > > components
> > > > > > > > > yet
> > > > > > > > > > > > more
> > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > >
> > > > > > > > > > > > > - What parts need information about this?
> > > > > > > > > > > > > -> do the slot profiles need information about the
> > GPU?
> > > > > > > > > > > > > -> Can the GPU Manager be a "self contained" thing
> > that
> > > > > > simply
> > > > > > > > > takes
> > > > > > > > > > > > > the configuration, and then abstracts everything
> > > > > internally?
> > > > > > > > > > Operators
> > > > > > > > > > > > can
> > > > > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> > > > > > karmagyz@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > Regarding the WebUI and GPUInfo, you're right,
> > I'll add
> > > > > > them
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > Regarding the general extended resource
> mechanism,
> > I
> > > > > second
> > > > > > > > > > Xintong's
> > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > - It's better to leverage ResourceProfile and
> > > > > ResourceSpec
> > > > > > > > after
> > > > > > > > > we
> > > > > > > > > > > > > > supporting fine-grained GPU scheduling. As a
> first
> > step
> > > > > > > > > proposal, I
> > > > > > > > > > > > > > prefer to not include it in the scope of this
> FLIP.
> > > > > > > > > > > > > > - Regarding the "Extended Resource Manager", if I
> > > > > > understand
> > > > > > > > > > > > > > correctly, it just a code refactoring atm, we
> could
> > > > > extract
> > > > > > > the
> > > > > > > > > > > > > > open/close/allocateExtendResources of GPUManager
> to
> > > > that
> > > > > > > > > > interface. If
> > > > > > > > > > > > > > that is the case, +1 to do it during
> > implementation.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > As Xintong said, we looked into how Spark
> supports
> > a
> > > > > > general
> > > > > > > > > > "Custom
> > > > > > > > > > > > > > Resource Scheduling" before and decided to
> > introduce a
> > > > > > common
> > > > > > > > > > resource
> > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > >
> > > > > > > > >
> > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > to make it more extensible. I think the
> "resource"
> > is a
> > > > > > > proper
> > > > > > > > > > level
> > > > > > > > > > > > > > to contain all the configs of extended resources.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > > > > > > > hxbks2ks@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > There is no doubt that GPU resource management
> > > > support
> > > > > > will
> > > > > > > > > > greatly
> > > > > > > > > > > > > > > facilitate the development of AI-related
> > applications
> > > > > by
> > > > > > > > > PyFlink
> > > > > > > > > > > > users.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regarding the names of several GPU
> > configurations, I
> > > > > > think
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > > > > better
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > delete the resource field makes it consistent
> > with
> > > > the
> > > > > > > names
> > > > > > > > of
> > > > > > > > > > other
> > > > > > > > > > > > > > > resource-related configurations in
> > TaskManagerOption.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > e.g.
> > taskmanager.resource.gpu.discovery-script.path
> > > > ->
> > > > > > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song <to...@gmail.com>
> > 于2020年3月4日周三
> > > > > > > 上午10:39写道:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Actually, Yangze, Yang and I also had an
> > offline
> > > > > > > discussion
> > > > > > > > > > about
> > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > the "GPU Support" as some general "Extended
> > > > Resource
> > > > > > > > > Support".
> > > > > > > > > > We
> > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > supporting extended resources in a general
> > > > mechanism
> > > > > is
> > > > > > > > > > definitely
> > > > > > > > > > > > a
> > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > and extensible way. The reason we propose
> this
> > FLIP
> > > > > > > > narrowing
> > > > > > > > > > its
> > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > down to GPU alone, is mainly for the concern
> on
> > > > extra
> > > > > > > > efforts
> > > > > > > > > > and
> > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > To come up with a well design on a general
> > extended
> > > > > > > > resource
> > > > > > > > > > > > management
> > > > > > > > > > > > > > > > mechanism, we would need to investigate more
> > on how
> > > > > > > people
> > > > > > > > > use
> > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > kind of resources in practice. For GPU, we
> > learnt
> > > > > such
> > > > > > > > > > knowledge
> > > > > > > > > > > > from
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > experts, Becket and his team members. But for
> > FPGA,
> > > > > or
> > > > > > > > other
> > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > extended resources, we don't have such
> > convenient
> > > > > > > > information
> > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > making the investigation requires more
> efforts,
> > > > > which I
> > > > > > > > tend
> > > > > > > > > to
> > > > > > > > > > > > think
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On the other hand, we also looked into how
> > Spark
> > > > > > > supports a
> > > > > > > > > > general
> > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > Resource Scheduling". Assuming we want to
> have
> > a
> > > > > > similar
> > > > > > > > > > general
> > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > resource mechanism in the future, we believe
> > that
> > > > the
> > > > > > > > current
> > > > > > > > > > GPU
> > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > design can be easily extended, in an
> > incremental
> > > > way
> > > > > > > > without
> > > > > > > > > > too
> > > > > > > > > > > > many
> > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - The most important part is probably user
> > > > > interfaces.
> > > > > > > > Spark
> > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > configuration options to define the amount,
> > > > discovery
> > > > > > > > script
> > > > > > > > > > and
> > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > k8s) in a per resource type bias [1], which
> is
> > very
> > > > > > > similar
> > > > > > > > > to
> > > > > > > > > > > > what
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > proposed in this FLIP. I think it's not
> > necessary
> > > > to
> > > > > > > expose
> > > > > > > > > > > > config
> > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > in the general way atm, since we do not have
> > > > supports
> > > > > > for
> > > > > > > > > other
> > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > types now. If later we decided to have per
> > resource
> > > > > > type
> > > > > > > > > config
> > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > can have backwards compatibility on the
> current
> > > > > > proposed
> > > > > > > > > > options
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > - For the GPU Manager, if later needed we can
> > > > change
> > > > > it
> > > > > > > to
> > > > > > > > a
> > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > Resource Manager" (or whatever it is called).
> > That
> > > > > > should
> > > > > > > > be
> > > > > > > > > a
> > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > > > > - For ResourceProfile and ResourceSpec, there
> > are
> > > > > > already
> > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > general extended resource. We can of course
> > > > leverage
> > > > > > them
> > > > > > > > > when
> > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > fine grained GPU scheduling. That is also not
> > in
> > > > the
> > > > > > > scope
> > > > > > > > of
> > > > > > > > > > > > this
> > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > step proposal, and would require FLIP-56 to
> be
> > > > > finished
> > > > > > > > > first.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > To summary up, I agree with Becket that have
> a
> > > > > separate
> > > > > > > > FLIP
> > > > > > > > > > for
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > general extended resource mechanism, and keep
> > it in
> > > > > > mind
> > > > > > > > when
> > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > That's a good point, Stephan. It makes
> total
> > > > sense
> > > > > to
> > > > > > > > > > generalize
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > resource management to support custom
> > resources.
> > > > > > Having
> > > > > > > > > that
> > > > > > > > > > > > allows
> > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > to add new resources by themselves. The
> > general
> > > > > > > resource
> > > > > > > > > > > > management
> > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 1. The custom resource type definition. It
> is
> > > > > > supported
> > > > > > > > by
> > > > > > > > > > the
> > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > resources in ResourceProfile and
> > ResourceSpec.
> > > > This
> > > > > > > will
> > > > > > > > > > likely
> > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 2. The custom resource allocation logic,
> > i.e. how
> > > > > to
> > > > > > > > assign
> > > > > > > > > > the
> > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > to different tasks, operators, and so on.
> > This
> > > > may
> > > > > > > > require
> > > > > > > > > > two
> > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > a. Subtask level - make sure the subtasks
> > are put
> > > > > > into
> > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > It is done by the global RM and is not
> > > > customizable
> > > > > > > right
> > > > > > > > > > now.
> > > > > > > > > > > > > > > > > b. Operator level - map the exact resource
> > to the
> > > > > > > > operators
> > > > > > > > > > > > in
> > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B.
> > This
> > > > > step
> > > > > > > is
> > > > > > > > > > needed
> > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > the global RM does not distinguish
> individual
> > > > > > resources
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > same
> > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The GPU manager is designed to do 2.b here.
> > So it
> > > > > > > should
> > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > physical GPU information and bind/match
> them
> > to
> > > > > each
> > > > > > > > > > operators.
> > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > general will fill in the missing piece to
> > support
> > > > > > > custom
> > > > > > > > > > resource
> > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > definition. But I'd avoid calling it a
> > "External
> > > > > > > Resource
> > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > confusion with RM, maybe something like
> > "Operator
> > > > > > > > Resource
> > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > be more accurate. So for each resource type
> > users
> > > > > can
> > > > > > > > have
> > > > > > > > > an
> > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > "Operator Resource Assigner" in the TM. For
> > > > memory,
> > > > > > > users
> > > > > > > > > > don't
> > > > > > > > > > > > need
> > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > but for other extended resources, users may
> > need
> > > > > > that.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Personally I think a pluggable "Operator
> > Resource
> > > > > > > > Assigner"
> > > > > > > > > > is
> > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > in this FLIP. But I am also OK with having
> > that
> > > > in
> > > > > a
> > > > > > > > > separate
> > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > the interface between the "Operator
> Resource
> > > > > > Assigner"
> > > > > > > > and
> > > > > > > > > > > > operator
> > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > take a while to settle down if we want to
> > make it
> > > > > > > > generic.
> > > > > > > > > > But I
> > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > implementation should take this future work
> > into
> > > > > > > > > > consideration so
> > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > don't need to break backwards compatibility
> > once
> > > > we
> > > > > > > have
> > > > > > > > > > that.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan
> Ewen
> > <
> > > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I cannot really give much input into the
> > > > > mechanics
> > > > > > of
> > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > and GPU allocation, as I have no
> experience
> > > > with
> > > > > > > that.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > One thought I had when reading the
> > proposal is
> > > > if
> > > > > > it
> > > > > > > > > makes
> > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > the "GPU Manager" as an "External
> Resource
> > > > > > Manager",
> > > > > > > > and
> > > > > > > > > > GPU
> > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > The way I understand the ResourceProfile
> > and
> > > > > > > > > ResourceSpec,
> > > > > > > > > > > > that is
> > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > It has the advantage that it looks more
> > > > > extensible.
> > > > > > > > Maybe
> > > > > > > > > > > > there is
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > Resource, a specialized NVIDIA GPU
> > Resource,
> > > > and
> > > > > > FPGA
> > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket
> Qin <
> > > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU
> resource
> > > > > > management
> > > > > > > > > > support
> > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > for machine learning use cases.
> Actually
> > it
> > > > is
> > > > > > one
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > question from the users who are
> > interested in
> > > > > > using
> > > > > > > > > Flink
> > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Some quick comments / questions to the
> > wiki.
> > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API should probably
> > also
> > > > be
> > > > > > > > > > mentioned in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > 2. Is the data structure that holds GPU
> > info
> > > > > > also a
> > > > > > > > > > public
> > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong
> > Song
> > > > <
> > > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for drafting the FLIP and
> > kicking
> > > > off
> > > > > > the
> > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Big +1 for this feature. Supporting
> > using
> > > > of
> > > > > > GPU
> > > > > > > in
> > > > > > > > > > Flink
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and
> it
> > > > looks
> > > > > > good
> > > > > > > > to
> > > > > > > > > > me. I
> > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > very good first step for Flink's GPU
> > > > > supports.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM
> Yangze
> > Guo
> > > > <
> > > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > We would like to start a discussion
> > > > thread
> > > > > on
> > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > Add
> > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > This FLIP mainly discusses the
> > following
> > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > - Enable user to configure how many
> > GPUs
> > > > > in a
> > > > > > > > task
> > > > > > > > > > > > executor
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > forward such requirements to the
> > external
> > > > > > > > resource
> > > > > > > > > > > > managers
> > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > > > > > > > - Provide information of available
> > GPU
> > > > > > > resources
> > > > > > > > to
> > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Key changes proposed in the FLIP
> are
> > as
> > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > - Forward GPU resource requirements
> > to
> > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > - Introduce GPUManager as one of
> the
> > task
> > > > > > > manager
> > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > and expose GPU resource information
> > to
> > > > the
> > > > > > > > context
> > > > > > > > > of
> > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > - Introduce the default script for
> > GPU
> > > > > > > discovery,
> > > > > > > > > in
> > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > the privilege mode to help user to
> > > > achieve
> > > > > > > > > > worker-level
> > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Please find more details in the
> FLIP
> > wiki
> > > > > > > > document
> > > > > > > > > > [1].
> > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Stephan Ewen <se...@apache.org>.
Nice, thanks a lot!

On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <ka...@gmail.com> wrote:

> Thanks for the suggestion, @Stephan, @Becket and @Xintong.
>
> I've updated the FLIP accordingly. I do not add a
> ResourceInfoProvider. Instead, I introduce the ExternalResourceDriver,
> which takes the responsibility of all relevant operations on both RM
> and TM sides.
> After a rethink about decoupling the management of external resources
> from TaskExecutor, I think we could do the same thing on the
> ResourceManager side. We do not need to add a specific allocation
> logic to the ResourceManager each time we add a specific external
> resource.
> - For Yarn, we need the ExternalResourceDriver to edit the
> containerRequest.
> - For Kubenetes, ExternalResourceDriver could provide a decorator for
> the TM pod.
>
> In this way, just like MetricReporter, we allow users to define their
> custom ExternalResourceDriver. It is more extensible and fits the
> separation of concerns. For more details, please take a look at [1].
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
>
> Best,
> Yangze Guo
>
> On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <se...@apache.org> wrote:
> >
> > This sounds good to go ahead from my side.
> >
> > I like the approach that Becket suggested - in that case the core
> > abstraction that everyone would need to understand would be "external
> > resource allocation" and the "ResourceInfoProvider", and the GPU specific
> > code would be a specific implementation only known to that component that
> > allocates the external resource. That fits the separation of concerns
> well.
> >
> > I also understand that it should not be over-engineered in the first
> > version, so some simplification makes sense, and then gradually expand
> from
> > there.
> >
> > So +1 to go ahead with what was suggested above (Xintong / Becket) from
> my
> > side.
> >
> > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <to...@gmail.com>
> wrote:
> >
> > > Thanks for the comments, Stephan & Becket.
> > >
> > > @Stephan
> > >
> > > I see your concern, and I completely agree with you that we should
> first
> > > think about the "library" / "plugin" / "extension" style if possible.
> > >
> > > If GPUs are sliced and assigned during scheduling, there may be reason,
> > > > although it looks that it would belong to the slot then. Is that
> what we
> > > > are doing here?
> > >
> > >
> > > In the current proposal, we do not have the GPUs sliced and assigned to
> > > slots, because it could be problematic without dynamic slot allocation.
> > > E.g., the number of GPUs might not be evenly divisible by the number of
> > > slots.
> > >
> > > I think it makes sense to eventually have the GPUs assigned to slots.
> Even
> > > then, we might still need a TM level GPUManager (or ResourceProvider
> like
> > > Becket suggested). For memory, in each slot we can simply request the
> > > amount of memory, leaving it to JVM / OS to decide which memory
> (address)
> > > should be assigned. For GPU, and potentially other resources like
> FPGA, we
> > > need to explicitly specify which GPU (index) should be used.
> Therefore, we
> > > need some component at the TM level to coordinate which slot uses which
> > > GPU.
> > >
> > > IMO, unless we say Flink will not support slot-level GPU slicing at
> least
> > > in the foreseeable future, I don't see a good way to avoid touching
> the TM
> > > core. To that end, I think Becket's suggestion points to a good
> direction,
> > > that supports more features (GPU, FPGA, etc.) with less coupling to
> the TM
> > > core (only needs to understand the general interfaces). The detailed
> > > implementation for specific resource types can even be encapsulated as
> a
> > > library.
> > >
> > > @Becket
> > >
> > > Thanks for sharing your thought on the final state. Despite the
> details how
> > > the interfaces should look like, I think this is a really good
> abstraction
> > > for supporting general resource types.
> > >
> > > I'd like to further clarify that, the following three things are all
> that
> > > the "Flink core" needs to understand.
> > >
> > >    - The *amount* of resource, for scheduling. Actually, we already
> have
> > >    the Resource class in ResourceProfile and ResourceSpec for extended
> > >    resource. It's just not really used.
> > >    - The *info*, that Flink provides to the operators / user codes.
> > >    - The *provider*, which generates the info based on the amount.
> > >
> > > The "core" does not need to understand the specific implementation
> details
> > > of the above three. They can even be implemented in a 3rd-party
> library.
> > > Similar to how we allow users to define their custom MetricReporter.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > > > Thanks for the comment, Stephan.
> > > >
> > > >   - If everything becomes a "core feature", it will make the project
> hard
> > > > > to develop in the future. Thinking "library" / "plugin" /
> "extension"
> > > > style
> > > > > where possible helps.
> > > >
> > > >
> > > > Completely agree. It is much more important to design a mechanism
> than
> > > > focusing on a specific case. Here is what I am thinking to fully
> support
> > > > custom resource management:
> > > > 1. On the JM / RM side, use ResourceProfile and ResourceSpec to
> define
> > > the
> > > > resource and the amount required. They will be used to find suitable
> TMs
> > > > slots to run the tasks. At this point, the resources are only
> measured by
> > > > amount, i.e. they do not have individual ID.
> > > >
> > > > 2. On the TM side, have something like *"ResourceInfoProvider"* to
> > > identify
> > > > and provides the detail information of the individual resource, e.g.
> GPU
> > > > ID.. It is important because the operator may have to explicitly
> interact
> > > > with the physical resource it uses. The ResourceInfoProvider might
> look
> > > > like something below.
> > > > interface ResourceInfoProvider<INFO> {
> > > >     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId opId,
> > > > ResourceProfile resourceProfile);
> > > > }
> > > >
> > > > - There could be several "*ResourceInfoProvider*" configured on the
> TM to
> > > > retrieve the information for different resources.
> > > > - The TM will be responsible to assign those individual resources to
> each
> > > > operator according to their requested amount.
> > > > - The operators will be able to get the ResourceInfo from their
> > > > RuntimeContext.
> > > >
> > > > If we agree this is a reasonable final state. We can adapt the
> current
> > > FLIP
> > > > to it. In fact it does not sound a big change to me. All the proposed
> > > > configuration can be as is, it is just that Flink itself won't care
> about
> > > > them, instead a GPUInfoProviver implementing the ResourceInfoProvider
> > > will
> > > > use them.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <se...@apache.org>
> wrote:
> > > >
> > > > > Hi all!
> > > > >
> > > > > The main point I wanted to throw into the discussion is the
> following:
> > > > >   - With more and more use cases, more and more tools go into Flink
> > > > >   - If everything becomes a "core feature", it will make the
> project
> > > hard
> > > > > to develop in the future. Thinking "library" / "plugin" /
> "extension"
> > > > style
> > > > > where possible helps.
> > > > >
> > > > >   - A good thought experiment is always: How many future developers
> > > have
> > > > to
> > > > > interact with this code (and possibly understand it partially),
> even if
> > > > the
> > > > > features they touch have nothing to do with GPU support. If many
> > > > > contributors to unrelated features will have to touch it and
> understand
> > > > it,
> > > > > then let's think if there is a different solution. Maybe there is
> not,
> > > > but
> > > > > then we should be sure why.
> > > > >
> > > > >   - That led me to raising this issue: If the GPU manager becomes a
> > > core
> > > > > service in the TaskManager, Environment, RuntimeContext, etc. then
> > > > everyone
> > > > > developing TM and streaming tasks need to understand the GPU
> manager.
> > > > That
> > > > > seems oddly specific, is my impression.
> > > > >
> > > > > Access to configuration seems not the right reason to do that. We
> > > should
> > > > > expose the Flink configuration from the RuntimeContext anyways.
> > > > >
> > > > > If GPUs are sliced and assigned during scheduling, there may be
> reason,
> > > > > although it looks that it would belong to the slot then. Is that
> what
> > > we
> > > > > are doing here?
> > > > >
> > > > > Best,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> tonysong820@gmail.com>
> > > > > wrote:
> > > > >
> > > > > >  Thanks for the feedback, Becket.
> > > > > >
> > > > > > IMO, eventually an operator should only see info of GPUs that are
> > > > > dedicated
> > > > > > for it, instead of all GPUs on the machine/container in the
> current
> > > > > design.
> > > > > > It does not make sense to let the user who writes a UDF to worry
> > > about
> > > > > > coordination among multiple operators running on the same
> machine.
> > > And
> > > > if
> > > > > > we want to limit the GPU info an operator sees, we should not
> let the
> > > > > > operator to instantiate GPUManager, which means we have to expose
> > > > > something
> > > > > > through runtime context, either GPU info or some kind of limited
> > > access
> > > > > to
> > > > > > the GPUManager.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <becket.qin@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > It probably make sense for us to first agree on the final
> state.
> > > More
> > > > > > > specifically, will the resource info be exposed through runtime
> > > > context
> > > > > > > eventually?
> > > > > > >
> > > > > > > If that is the final state and we have a seamless migration
> story
> > > > from
> > > > > > this
> > > > > > > FLIP to that final state, Personally I think it is OK to
> expose the
> > > > GPU
> > > > > > > info in the runtime context.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> > > tonysong820@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > @Yangze,
> > > > > > > > I think what Stephan means (@Stephan, please correct me if
> I'm
> > > > wrong)
> > > > > > is
> > > > > > > > that, we might not need to hold and maintain the GPUManager
> as a
> > > > > > service
> > > > > > > in
> > > > > > > > TaskManagerServices or RuntimeContext. An alternative is to
> > > create
> > > > /
> > > > > > > > retrieve the GPUManager only in the operators that need it,
> e.g.,
> > > > > with
> > > > > > a
> > > > > > > > static method `GPUManager.get()`.
> > > > > > > >
> > > > > > > > @Stephan,
> > > > > > > > I agree with you on excluding GPUManager from
> > > TaskManagerServices.
> > > > > > > >
> > > > > > > >    - For the first step, where we provide unified TM-level
> GPU
> > > > > > > information
> > > > > > > >    to all operators, it should be fine to have operators
> access /
> > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > >    - In future, we might have some more fine-grained GPU
> > > > management,
> > > > > > > where
> > > > > > > >    we need to maintain GPUManager as a service and put GPU
> info
> > > in
> > > > > slot
> > > > > > > >    profiles. But at least for now it's not necessary to
> introduce
> > > > > such
> > > > > > > >    complexity.
> > > > > > > >
> > > > > > > > However, I have some concerns on excluding GPUManager from
> > > > > > RuntimeContext
> > > > > > > > and let operators access it directly.
> > > > > > > >
> > > > > > > >    - Configurations needed for creating the GPUManager is not
> > > > always
> > > > > > > >    available for operators.
> > > > > > > >    - If later we want to have fine-grained control over GPU
> > > (e.g.,
> > > > > > > >    operators in each slot can only see GPUs reserved for that
> > > > slot),
> > > > > > the
> > > > > > > >    approach cannot be easily extended.
> > > > > > > >
> > > > > > > > I would suggest to wrap the GPUManager behind RuntimeContext
> and
> > > > only
> > > > > > > > expose the GPUInfo to users. For now, we can declare a method
> > > > > > > > `getGPUInfo()` in RuntimeContext, with a default definition
> that
> > > > > calls
> > > > > > > > `GPUManager.get()` to get the lazily-created GPUManager. If
> later
> > > > we
> > > > > > want
> > > > > > > > to create / retrieve GPUManager in a different way, we can
> simply
> > > > > > change
> > > > > > > > how `getGPUInfo` is implemented, without needing to change
> any
> > > > public
> > > > > > > > interfaces.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <
> karmagyz@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > @Shephan
> > > > > > > > > Do you mean Minicluster? Yes, it makes sense to share the
> GPU
> > > > > Manager
> > > > > > > > > in such scenario.
> > > > > > > > > If that's what you worry about, I'm +1 for holding
> > > > > > > > > GPUManager(ExternalResourceManagers) in TaskExecutor
> instead of
> > > > > > > > > TaskManagerServices.
> > > > > > > > >
> > > > > > > > > Regarding the RuntimeContext/FunctionContext, it just
> holds the
> > > > GPU
> > > > > > > > > info instead of the GPU Manager. AFAIK, it's the only
> place we
> > > > > could
> > > > > > > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yangze Guo
> > > > > > > > >
> > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> > > > > isaac@paddlesoft.net
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org
> > > wrote
> > > > > > ----
> > > > > > > > > >
> > > > > > > > > > > > Can we somehow keep this out of the TaskManager
> services
> > > > > > > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > > > > > > ExternalServicesManagers in future) is conceptually
> one of
> > > > the
> > > > > > task
> > > > > > > > > > > manager services, just like MemoryManager before 1.10.
> > > > > > > > > > > - It maintains/holds the GPU resource at TM level and
> all
> > > of
> > > > > the
> > > > > > > > > > > operators allocate the GPU resources from it. So, it
> should
> > > > be
> > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > - We could add a collection called
> ExternalResourceManagers
> > > > to
> > > > > > hold
> > > > > > > > > > > all managers of other external resources in the future.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Can you help me understand why this needs the addition in
> > > > > > > > > TaskMagerServices
> > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > Are you worried about the case when multiple Task
> Executors
> > > run
> > > > > in
> > > > > > > the
> > > > > > > > > same
> > > > > > > > > > JVM? That's not common, but wouldn't it actually be good
> in
> > > > that
> > > > > > case
> > > > > > > > to
> > > > > > > > > > share the GPU Manager, given that the GPU is shared?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Stephan
> > > > > > > > > >
> > > > > > > > > > ---------------------------
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > What parts need information about this?
> > > > > > > > > > > In this FLIP, operators need the information. Thus, we
> > > expose
> > > > > GPU
> > > > > > > > > > > information to the RuntimeContext/FunctionContext. The
> slot
> > > > > > profile
> > > > > > > > is
> > > > > > > > > > > not aware of GPU resources as GPU is TM level resource
> now.
> > > > > > > > > > >
> > > > > > > > > > > > Can the GPU Manager be a "self contained" thing that
> > > simply
> > > > > > takes
> > > > > > > > the
> > > > > > > > > > > configuration, and then abstracts everything
> internally?
> > > > > > > > > > > Yes, we just pass the path/args of the discover script
> and
> > > > how
> > > > > > many
> > > > > > > > > > > GPUs per TM to it. It takes the responsibility to get
> the
> > > GPU
> > > > > > > > > > > information and expose them to the
> > > > > RuntimeContext/FunctionContext
> > > > > > > of
> > > > > > > > > > > Operators. Meanwhile, we'd better not allow operators
> to
> > > > > directly
> > > > > > > > > > > access GPUManager, it should get what they want from
> > > Context.
> > > > > We
> > > > > > > > could
> > > > > > > > > > > then decouple the interface/implementation of
> GPUManager
> > > and
> > > > > > Public
> > > > > > > > > > > API.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Yangze Guo
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <
> > > > sewen@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > It sounds fine to initially start with GPU specific
> > > support
> > > > > and
> > > > > > > > think
> > > > > > > > > > > about
> > > > > > > > > > > > generalizing this once we better understand the
> space.
> > > > > > > > > > > >
> > > > > > > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > > > > > > - Can we somehow keep this out of the TaskManager
> > > services?
> > > > > > > > Anything
> > > > > > > > > we
> > > > > > > > > > > > have to pull through all layers of the TM makes the
> TM
> > > > > > components
> > > > > > > > yet
> > > > > > > > > > > more
> > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > >
> > > > > > > > > > > > - What parts need information about this?
> > > > > > > > > > > > -> do the slot profiles need information about the
> GPU?
> > > > > > > > > > > > -> Can the GPU Manager be a "self contained" thing
> that
> > > > > simply
> > > > > > > > takes
> > > > > > > > > > > > the configuration, and then abstracts everything
> > > > internally?
> > > > > > > > > Operators
> > > > > > > > > > > can
> > > > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> > > > > karmagyz@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > Regarding the WebUI and GPUInfo, you're right,
> I'll add
> > > > > them
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > Regarding the general extended resource mechanism,
> I
> > > > second
> > > > > > > > > Xintong's
> > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > - It's better to leverage ResourceProfile and
> > > > ResourceSpec
> > > > > > > after
> > > > > > > > we
> > > > > > > > > > > > > supporting fine-grained GPU scheduling. As a first
> step
> > > > > > > > proposal, I
> > > > > > > > > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > > > > > > > > - Regarding the "Extended Resource Manager", if I
> > > > > understand
> > > > > > > > > > > > > correctly, it just a code refactoring atm, we could
> > > > extract
> > > > > > the
> > > > > > > > > > > > > open/close/allocateExtendResources of GPUManager to
> > > that
> > > > > > > > > interface. If
> > > > > > > > > > > > > that is the case, +1 to do it during
> implementation.
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > As Xintong said, we looked into how Spark supports
> a
> > > > > general
> > > > > > > > > "Custom
> > > > > > > > > > > > > Resource Scheduling" before and decided to
> introduce a
> > > > > common
> > > > > > > > > resource
> > > > > > > > > > > > > configuration
> > > > > > > > > > > > >
> > > > > > > >
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > to make it more extensible. I think the "resource"
> is a
> > > > > > proper
> > > > > > > > > level
> > > > > > > > > > > > > to contain all the configs of extended resources.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > > > > > > hxbks2ks@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > There is no doubt that GPU resource management
> > > support
> > > > > will
> > > > > > > > > greatly
> > > > > > > > > > > > > > facilitate the development of AI-related
> applications
> > > > by
> > > > > > > > PyFlink
> > > > > > > > > > > users.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regarding the names of several GPU
> configurations, I
> > > > > think
> > > > > > it
> > > > > > > > is
> > > > > > > > > > > better
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > delete the resource field makes it consistent
> with
> > > the
> > > > > > names
> > > > > > > of
> > > > > > > > > other
> > > > > > > > > > > > > > resource-related configurations in
> TaskManagerOption.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > e.g.
> taskmanager.resource.gpu.discovery-script.path
> > > ->
> > > > > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song <to...@gmail.com>
> 于2020年3月4日周三
> > > > > > 上午10:39写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Actually, Yangze, Yang and I also had an
> offline
> > > > > > discussion
> > > > > > > > > about
> > > > > > > > > > > > > making
> > > > > > > > > > > > > > > the "GPU Support" as some general "Extended
> > > Resource
> > > > > > > > Support".
> > > > > > > > > We
> > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > supporting extended resources in a general
> > > mechanism
> > > > is
> > > > > > > > > definitely
> > > > > > > > > > > a
> > > > > > > > > > > > > good
> > > > > > > > > > > > > > > and extensible way. The reason we propose this
> FLIP
> > > > > > > narrowing
> > > > > > > > > its
> > > > > > > > > > > scope
> > > > > > > > > > > > > > > down to GPU alone, is mainly for the concern on
> > > extra
> > > > > > > efforts
> > > > > > > > > and
> > > > > > > > > > > > > review
> > > > > > > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > To come up with a well design on a general
> extended
> > > > > > > resource
> > > > > > > > > > > management
> > > > > > > > > > > > > > > mechanism, we would need to investigate more
> on how
> > > > > > people
> > > > > > > > use
> > > > > > > > > > > > > different
> > > > > > > > > > > > > > > kind of resources in practice. For GPU, we
> learnt
> > > > such
> > > > > > > > > knowledge
> > > > > > > > > > > from
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > experts, Becket and his team members. But for
> FPGA,
> > > > or
> > > > > > > other
> > > > > > > > > > > potential
> > > > > > > > > > > > > > > extended resources, we don't have such
> convenient
> > > > > > > information
> > > > > > > > > > > sources,
> > > > > > > > > > > > > > > making the investigation requires more efforts,
> > > > which I
> > > > > > > tend
> > > > > > > > to
> > > > > > > > > > > think
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On the other hand, we also looked into how
> Spark
> > > > > > supports a
> > > > > > > > > general
> > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > Resource Scheduling". Assuming we want to have
> a
> > > > > similar
> > > > > > > > > general
> > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > resource mechanism in the future, we believe
> that
> > > the
> > > > > > > current
> > > > > > > > > GPU
> > > > > > > > > > > > > support
> > > > > > > > > > > > > > > design can be easily extended, in an
> incremental
> > > way
> > > > > > > without
> > > > > > > > > too
> > > > > > > > > > > many
> > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - The most important part is probably user
> > > > interfaces.
> > > > > > > Spark
> > > > > > > > > > > offers
> > > > > > > > > > > > > > > configuration options to define the amount,
> > > discovery
> > > > > > > script
> > > > > > > > > and
> > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > k8s) in a per resource type bias [1], which is
> very
> > > > > > similar
> > > > > > > > to
> > > > > > > > > > > what
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > proposed in this FLIP. I think it's not
> necessary
> > > to
> > > > > > expose
> > > > > > > > > > > config
> > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > in the general way atm, since we do not have
> > > supports
> > > > > for
> > > > > > > > other
> > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > types now. If later we decided to have per
> resource
> > > > > type
> > > > > > > > config
> > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > can have backwards compatibility on the current
> > > > > proposed
> > > > > > > > > options
> > > > > > > > > > > > > with
> > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > - For the GPU Manager, if later needed we can
> > > change
> > > > it
> > > > > > to
> > > > > > > a
> > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > Resource Manager" (or whatever it is called).
> That
> > > > > should
> > > > > > > be
> > > > > > > > a
> > > > > > > > > > > pure
> > > > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > > > - For ResourceProfile and ResourceSpec, there
> are
> > > > > already
> > > > > > > > > > > fields for
> > > > > > > > > > > > > > > general extended resource. We can of course
> > > leverage
> > > > > them
> > > > > > > > when
> > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > fine grained GPU scheduling. That is also not
> in
> > > the
> > > > > > scope
> > > > > > > of
> > > > > > > > > > > this
> > > > > > > > > > > > > first
> > > > > > > > > > > > > > > step proposal, and would require FLIP-56 to be
> > > > finished
> > > > > > > > first.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > To summary up, I agree with Becket that have a
> > > > separate
> > > > > > > FLIP
> > > > > > > > > for
> > > > > > > > > > > the
> > > > > > > > > > > > > > > general extended resource mechanism, and keep
> it in
> > > > > mind
> > > > > > > when
> > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > That's a good point, Stephan. It makes total
> > > sense
> > > > to
> > > > > > > > > generalize
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > resource management to support custom
> resources.
> > > > > Having
> > > > > > > > that
> > > > > > > > > > > allows
> > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > to add new resources by themselves. The
> general
> > > > > > resource
> > > > > > > > > > > management
> > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1. The custom resource type definition. It is
> > > > > supported
> > > > > > > by
> > > > > > > > > the
> > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > resources in ResourceProfile and
> ResourceSpec.
> > > This
> > > > > > will
> > > > > > > > > likely
> > > > > > > > > > > cover
> > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. The custom resource allocation logic,
> i.e. how
> > > > to
> > > > > > > assign
> > > > > > > > > the
> > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > to different tasks, operators, and so on.
> This
> > > may
> > > > > > > require
> > > > > > > > > two
> > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > a. Subtask level - make sure the subtasks
> are put
> > > > > into
> > > > > > > > > > > suitable
> > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > It is done by the global RM and is not
> > > customizable
> > > > > > right
> > > > > > > > > now.
> > > > > > > > > > > > > > > > b. Operator level - map the exact resource
> to the
> > > > > > > operators
> > > > > > > > > > > in
> > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B.
> This
> > > > step
> > > > > > is
> > > > > > > > > needed
> > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > the global RM does not distinguish individual
> > > > > resources
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > same
> > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The GPU manager is designed to do 2.b here.
> So it
> > > > > > should
> > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > physical GPU information and bind/match them
> to
> > > > each
> > > > > > > > > operators.
> > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > general will fill in the missing piece to
> support
> > > > > > custom
> > > > > > > > > resource
> > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > definition. But I'd avoid calling it a
> "External
> > > > > > Resource
> > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > confusion with RM, maybe something like
> "Operator
> > > > > > > Resource
> > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > be more accurate. So for each resource type
> users
> > > > can
> > > > > > > have
> > > > > > > > an
> > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > "Operator Resource Assigner" in the TM. For
> > > memory,
> > > > > > users
> > > > > > > > > don't
> > > > > > > > > > > need
> > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > but for other extended resources, users may
> need
> > > > > that.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Personally I think a pluggable "Operator
> Resource
> > > > > > > Assigner"
> > > > > > > > > is
> > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > in this FLIP. But I am also OK with having
> that
> > > in
> > > > a
> > > > > > > > separate
> > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > the interface between the "Operator Resource
> > > > > Assigner"
> > > > > > > and
> > > > > > > > > > > operator
> > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > take a while to settle down if we want to
> make it
> > > > > > > generic.
> > > > > > > > > But I
> > > > > > > > > > > > > think
> > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > implementation should take this future work
> into
> > > > > > > > > consideration so
> > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > don't need to break backwards compatibility
> once
> > > we
> > > > > > have
> > > > > > > > > that.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen
> <
> > > > > > > > > sewen@apache.org>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I cannot really give much input into the
> > > > mechanics
> > > > > of
> > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > and GPU allocation, as I have no experience
> > > with
> > > > > > that.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > One thought I had when reading the
> proposal is
> > > if
> > > > > it
> > > > > > > > makes
> > > > > > > > > > > sense to
> > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > the "GPU Manager" as an "External Resource
> > > > > Manager",
> > > > > > > and
> > > > > > > > > GPU
> > > > > > > > > > > is one
> > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > The way I understand the ResourceProfile
> and
> > > > > > > > ResourceSpec,
> > > > > > > > > > > that is
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > It has the advantage that it looks more
> > > > extensible.
> > > > > > > Maybe
> > > > > > > > > > > there is
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > Resource, a specialized NVIDIA GPU
> Resource,
> > > and
> > > > > FPGA
> > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource
> > > > > management
> > > > > > > > > support
> > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > for machine learning use cases. Actually
> it
> > > is
> > > > > one
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > mostly
> > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > question from the users who are
> interested in
> > > > > using
> > > > > > > > Flink
> > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Some quick comments / questions to the
> wiki.
> > > > > > > > > > > > > > > > > > 1. The WebUI / REST API should probably
> also
> > > be
> > > > > > > > > mentioned in
> > > > > > > > > > > the
> > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > 2. Is the data structure that holds GPU
> info
> > > > > also a
> > > > > > > > > public
> > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong
> Song
> > > <
> > > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for drafting the FLIP and
> kicking
> > > off
> > > > > the
> > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Big +1 for this feature. Supporting
> using
> > > of
> > > > > GPU
> > > > > > in
> > > > > > > > > Flink
> > > > > > > > > > > is
> > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and it
> > > looks
> > > > > good
> > > > > > > to
> > > > > > > > > me. I
> > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > very good first step for Flink's GPU
> > > > supports.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze
> Guo
> > > <
> > > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > We would like to start a discussion
> > > thread
> > > > on
> > > > > > > > > "FLIP-108:
> > > > > > > > > > > Add
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > This FLIP mainly discusses the
> following
> > > > > > issues:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > - Enable user to configure how many
> GPUs
> > > > in a
> > > > > > > task
> > > > > > > > > > > executor
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > forward such requirements to the
> external
> > > > > > > resource
> > > > > > > > > > > managers
> > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > > > > > > - Provide information of available
> GPU
> > > > > > resources
> > > > > > > to
> > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Key changes proposed in the FLIP are
> as
> > > > > > follows:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > - Forward GPU resource requirements
> to
> > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > - Introduce GPUManager as one of the
> task
> > > > > > manager
> > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > and expose GPU resource information
> to
> > > the
> > > > > > > context
> > > > > > > > of
> > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > - Introduce the default script for
> GPU
> > > > > > discovery,
> > > > > > > > in
> > > > > > > > > > > which we
> > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > the privilege mode to help user to
> > > achieve
> > > > > > > > > worker-level
> > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Please find more details in the FLIP
> wiki
> > > > > > > document
> > > > > > > > > [1].
> > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Yangze Guo <ka...@gmail.com>.
Thanks for the suggestion, @Stephan, @Becket and @Xintong.

I've updated the FLIP accordingly. I do not add a
ResourceInfoProvider. Instead, I introduce the ExternalResourceDriver,
which takes the responsibility of all relevant operations on both RM
and TM sides.
After a rethink about decoupling the management of external resources
from TaskExecutor, I think we could do the same thing on the
ResourceManager side. We do not need to add a specific allocation
logic to the ResourceManager each time we add a specific external
resource.
- For Yarn, we need the ExternalResourceDriver to edit the containerRequest.
- For Kubenetes, ExternalResourceDriver could provide a decorator for
the TM pod.

In this way, just like MetricReporter, we allow users to define their
custom ExternalResourceDriver. It is more extensible and fits the
separation of concerns. For more details, please take a look at [1].

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink

Best,
Yangze Guo

On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <se...@apache.org> wrote:
>
> This sounds good to go ahead from my side.
>
> I like the approach that Becket suggested - in that case the core
> abstraction that everyone would need to understand would be "external
> resource allocation" and the "ResourceInfoProvider", and the GPU specific
> code would be a specific implementation only known to that component that
> allocates the external resource. That fits the separation of concerns well.
>
> I also understand that it should not be over-engineered in the first
> version, so some simplification makes sense, and then gradually expand from
> there.
>
> So +1 to go ahead with what was suggested above (Xintong / Becket) from my
> side.
>
> On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <to...@gmail.com> wrote:
>
> > Thanks for the comments, Stephan & Becket.
> >
> > @Stephan
> >
> > I see your concern, and I completely agree with you that we should first
> > think about the "library" / "plugin" / "extension" style if possible.
> >
> > If GPUs are sliced and assigned during scheduling, there may be reason,
> > > although it looks that it would belong to the slot then. Is that what we
> > > are doing here?
> >
> >
> > In the current proposal, we do not have the GPUs sliced and assigned to
> > slots, because it could be problematic without dynamic slot allocation.
> > E.g., the number of GPUs might not be evenly divisible by the number of
> > slots.
> >
> > I think it makes sense to eventually have the GPUs assigned to slots. Even
> > then, we might still need a TM level GPUManager (or ResourceProvider like
> > Becket suggested). For memory, in each slot we can simply request the
> > amount of memory, leaving it to JVM / OS to decide which memory (address)
> > should be assigned. For GPU, and potentially other resources like FPGA, we
> > need to explicitly specify which GPU (index) should be used. Therefore, we
> > need some component at the TM level to coordinate which slot uses which
> > GPU.
> >
> > IMO, unless we say Flink will not support slot-level GPU slicing at least
> > in the foreseeable future, I don't see a good way to avoid touching the TM
> > core. To that end, I think Becket's suggestion points to a good direction,
> > that supports more features (GPU, FPGA, etc.) with less coupling to the TM
> > core (only needs to understand the general interfaces). The detailed
> > implementation for specific resource types can even be encapsulated as a
> > library.
> >
> > @Becket
> >
> > Thanks for sharing your thought on the final state. Despite the details how
> > the interfaces should look like, I think this is a really good abstraction
> > for supporting general resource types.
> >
> > I'd like to further clarify that, the following three things are all that
> > the "Flink core" needs to understand.
> >
> >    - The *amount* of resource, for scheduling. Actually, we already have
> >    the Resource class in ResourceProfile and ResourceSpec for extended
> >    resource. It's just not really used.
> >    - The *info*, that Flink provides to the operators / user codes.
> >    - The *provider*, which generates the info based on the amount.
> >
> > The "core" does not need to understand the specific implementation details
> > of the above three. They can even be implemented in a 3rd-party library.
> > Similar to how we allow users to define their custom MetricReporter.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <be...@gmail.com> wrote:
> >
> > > Thanks for the comment, Stephan.
> > >
> > >   - If everything becomes a "core feature", it will make the project hard
> > > > to develop in the future. Thinking "library" / "plugin" / "extension"
> > > style
> > > > where possible helps.
> > >
> > >
> > > Completely agree. It is much more important to design a mechanism than
> > > focusing on a specific case. Here is what I am thinking to fully support
> > > custom resource management:
> > > 1. On the JM / RM side, use ResourceProfile and ResourceSpec to define
> > the
> > > resource and the amount required. They will be used to find suitable TMs
> > > slots to run the tasks. At this point, the resources are only measured by
> > > amount, i.e. they do not have individual ID.
> > >
> > > 2. On the TM side, have something like *"ResourceInfoProvider"* to
> > identify
> > > and provides the detail information of the individual resource, e.g. GPU
> > > ID.. It is important because the operator may have to explicitly interact
> > > with the physical resource it uses. The ResourceInfoProvider might look
> > > like something below.
> > > interface ResourceInfoProvider<INFO> {
> > >     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId opId,
> > > ResourceProfile resourceProfile);
> > > }
> > >
> > > - There could be several "*ResourceInfoProvider*" configured on the TM to
> > > retrieve the information for different resources.
> > > - The TM will be responsible to assign those individual resources to each
> > > operator according to their requested amount.
> > > - The operators will be able to get the ResourceInfo from their
> > > RuntimeContext.
> > >
> > > If we agree this is a reasonable final state. We can adapt the current
> > FLIP
> > > to it. In fact it does not sound a big change to me. All the proposed
> > > configuration can be as is, it is just that Flink itself won't care about
> > > them, instead a GPUInfoProviver implementing the ResourceInfoProvider
> > will
> > > use them.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <se...@apache.org> wrote:
> > >
> > > > Hi all!
> > > >
> > > > The main point I wanted to throw into the discussion is the following:
> > > >   - With more and more use cases, more and more tools go into Flink
> > > >   - If everything becomes a "core feature", it will make the project
> > hard
> > > > to develop in the future. Thinking "library" / "plugin" / "extension"
> > > style
> > > > where possible helps.
> > > >
> > > >   - A good thought experiment is always: How many future developers
> > have
> > > to
> > > > interact with this code (and possibly understand it partially), even if
> > > the
> > > > features they touch have nothing to do with GPU support. If many
> > > > contributors to unrelated features will have to touch it and understand
> > > it,
> > > > then let's think if there is a different solution. Maybe there is not,
> > > but
> > > > then we should be sure why.
> > > >
> > > >   - That led me to raising this issue: If the GPU manager becomes a
> > core
> > > > service in the TaskManager, Environment, RuntimeContext, etc. then
> > > everyone
> > > > developing TM and streaming tasks need to understand the GPU manager.
> > > That
> > > > seems oddly specific, is my impression.
> > > >
> > > > Access to configuration seems not the right reason to do that. We
> > should
> > > > expose the Flink configuration from the RuntimeContext anyways.
> > > >
> > > > If GPUs are sliced and assigned during scheduling, there may be reason,
> > > > although it looks that it would belong to the slot then. Is that what
> > we
> > > > are doing here?
> > > >
> > > > Best,
> > > > Stephan
> > > >
> > > >
> > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <to...@gmail.com>
> > > > wrote:
> > > >
> > > > >  Thanks for the feedback, Becket.
> > > > >
> > > > > IMO, eventually an operator should only see info of GPUs that are
> > > > dedicated
> > > > > for it, instead of all GPUs on the machine/container in the current
> > > > design.
> > > > > It does not make sense to let the user who writes a UDF to worry
> > about
> > > > > coordination among multiple operators running on the same machine.
> > And
> > > if
> > > > > we want to limit the GPU info an operator sees, we should not let the
> > > > > operator to instantiate GPUManager, which means we have to expose
> > > > something
> > > > > through runtime context, either GPU info or some kind of limited
> > access
> > > > to
> > > > > the GPUManager.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > > > It probably make sense for us to first agree on the final state.
> > More
> > > > > > specifically, will the resource info be exposed through runtime
> > > context
> > > > > > eventually?
> > > > > >
> > > > > > If that is the final state and we have a seamless migration story
> > > from
> > > > > this
> > > > > > FLIP to that final state, Personally I think it is OK to expose the
> > > GPU
> > > > > > info in the runtime context.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> > tonysong820@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > @Yangze,
> > > > > > > I think what Stephan means (@Stephan, please correct me if I'm
> > > wrong)
> > > > > is
> > > > > > > that, we might not need to hold and maintain the GPUManager as a
> > > > > service
> > > > > > in
> > > > > > > TaskManagerServices or RuntimeContext. An alternative is to
> > create
> > > /
> > > > > > > retrieve the GPUManager only in the operators that need it, e.g.,
> > > > with
> > > > > a
> > > > > > > static method `GPUManager.get()`.
> > > > > > >
> > > > > > > @Stephan,
> > > > > > > I agree with you on excluding GPUManager from
> > TaskManagerServices.
> > > > > > >
> > > > > > >    - For the first step, where we provide unified TM-level GPU
> > > > > > information
> > > > > > >    to all operators, it should be fine to have operators access /
> > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > >    - In future, we might have some more fine-grained GPU
> > > management,
> > > > > > where
> > > > > > >    we need to maintain GPUManager as a service and put GPU info
> > in
> > > > slot
> > > > > > >    profiles. But at least for now it's not necessary to introduce
> > > > such
> > > > > > >    complexity.
> > > > > > >
> > > > > > > However, I have some concerns on excluding GPUManager from
> > > > > RuntimeContext
> > > > > > > and let operators access it directly.
> > > > > > >
> > > > > > >    - Configurations needed for creating the GPUManager is not
> > > always
> > > > > > >    available for operators.
> > > > > > >    - If later we want to have fine-grained control over GPU
> > (e.g.,
> > > > > > >    operators in each slot can only see GPUs reserved for that
> > > slot),
> > > > > the
> > > > > > >    approach cannot be easily extended.
> > > > > > >
> > > > > > > I would suggest to wrap the GPUManager behind RuntimeContext and
> > > only
> > > > > > > expose the GPUInfo to users. For now, we can declare a method
> > > > > > > `getGPUInfo()` in RuntimeContext, with a default definition that
> > > > calls
> > > > > > > `GPUManager.get()` to get the lazily-created GPUManager. If later
> > > we
> > > > > want
> > > > > > > to create / retrieve GPUManager in a different way, we can simply
> > > > > change
> > > > > > > how `getGPUInfo` is implemented, without needing to change any
> > > public
> > > > > > > interfaces.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <ka...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > @Shephan
> > > > > > > > Do you mean Minicluster? Yes, it makes sense to share the GPU
> > > > Manager
> > > > > > > > in such scenario.
> > > > > > > > If that's what you worry about, I'm +1 for holding
> > > > > > > > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > > > > > > > TaskManagerServices.
> > > > > > > >
> > > > > > > > Regarding the RuntimeContext/FunctionContext, it just holds the
> > > GPU
> > > > > > > > info instead of the GPU Manager. AFAIK, it's the only place we
> > > > could
> > > > > > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> > > > isaac@paddlesoft.net
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org
> > wrote
> > > > > ----
> > > > > > > > >
> > > > > > > > > > > Can we somehow keep this out of the TaskManager services
> > > > > > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > > > > > ExternalServicesManagers in future) is conceptually one of
> > > the
> > > > > task
> > > > > > > > > > manager services, just like MemoryManager before 1.10.
> > > > > > > > > > - It maintains/holds the GPU resource at TM level and all
> > of
> > > > the
> > > > > > > > > > operators allocate the GPU resources from it. So, it should
> > > be
> > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > - We could add a collection called ExternalResourceManagers
> > > to
> > > > > hold
> > > > > > > > > > all managers of other external resources in the future.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Can you help me understand why this needs the addition in
> > > > > > > > TaskMagerServices
> > > > > > > > > or in the RuntimeContext?
> > > > > > > > > Are you worried about the case when multiple Task Executors
> > run
> > > > in
> > > > > > the
> > > > > > > > same
> > > > > > > > > JVM? That's not common, but wouldn't it actually be good in
> > > that
> > > > > case
> > > > > > > to
> > > > > > > > > share the GPU Manager, given that the GPU is shared?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Stephan
> > > > > > > > >
> > > > > > > > > ---------------------------
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > What parts need information about this?
> > > > > > > > > > In this FLIP, operators need the information. Thus, we
> > expose
> > > > GPU
> > > > > > > > > > information to the RuntimeContext/FunctionContext. The slot
> > > > > profile
> > > > > > > is
> > > > > > > > > > not aware of GPU resources as GPU is TM level resource now.
> > > > > > > > > >
> > > > > > > > > > > Can the GPU Manager be a "self contained" thing that
> > simply
> > > > > takes
> > > > > > > the
> > > > > > > > > > configuration, and then abstracts everything internally?
> > > > > > > > > > Yes, we just pass the path/args of the discover script and
> > > how
> > > > > many
> > > > > > > > > > GPUs per TM to it. It takes the responsibility to get the
> > GPU
> > > > > > > > > > information and expose them to the
> > > > RuntimeContext/FunctionContext
> > > > > > of
> > > > > > > > > > Operators. Meanwhile, we'd better not allow operators to
> > > > directly
> > > > > > > > > > access GPUManager, it should get what they want from
> > Context.
> > > > We
> > > > > > > could
> > > > > > > > > > then decouple the interface/implementation of GPUManager
> > and
> > > > > Public
> > > > > > > > > > API.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <
> > > sewen@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > It sounds fine to initially start with GPU specific
> > support
> > > > and
> > > > > > > think
> > > > > > > > > > about
> > > > > > > > > > > generalizing this once we better understand the space.
> > > > > > > > > > >
> > > > > > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > > > > > - Can we somehow keep this out of the TaskManager
> > services?
> > > > > > > Anything
> > > > > > > > we
> > > > > > > > > > > have to pull through all layers of the TM makes the TM
> > > > > components
> > > > > > > yet
> > > > > > > > > > more
> > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > >
> > > > > > > > > > > - What parts need information about this?
> > > > > > > > > > > -> do the slot profiles need information about the GPU?
> > > > > > > > > > > -> Can the GPU Manager be a "self contained" thing that
> > > > simply
> > > > > > > takes
> > > > > > > > > > > the configuration, and then abstracts everything
> > > internally?
> > > > > > > > Operators
> > > > > > > > > > can
> > > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> > > > karmagyz@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > >
> > > > > > > > > > > > @Becket
> > > > > > > > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add
> > > > them
> > > > > to
> > > > > > > the
> > > > > > > > > > > > Public API section.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > Regarding the general extended resource mechanism, I
> > > second
> > > > > > > > Xintong's
> > > > > > > > > > > > suggestion.
> > > > > > > > > > > > - It's better to leverage ResourceProfile and
> > > ResourceSpec
> > > > > > after
> > > > > > > we
> > > > > > > > > > > > supporting fine-grained GPU scheduling. As a first step
> > > > > > > proposal, I
> > > > > > > > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > > > > > > > - Regarding the "Extended Resource Manager", if I
> > > > understand
> > > > > > > > > > > > correctly, it just a code refactoring atm, we could
> > > extract
> > > > > the
> > > > > > > > > > > > open/close/allocateExtendResources of GPUManager to
> > that
> > > > > > > > interface. If
> > > > > > > > > > > > that is the case, +1 to do it during implementation.
> > > > > > > > > > > >
> > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > As Xintong said, we looked into how Spark supports a
> > > > general
> > > > > > > > "Custom
> > > > > > > > > > > > Resource Scheduling" before and decided to introduce a
> > > > common
> > > > > > > > resource
> > > > > > > > > > > > configuration
> > > > > > > > > > > >
> > > > > > >
> > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > to make it more extensible. I think the "resource" is a
> > > > > proper
> > > > > > > > level
> > > > > > > > > > > > to contain all the configs of extended resources.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > > > > > hxbks2ks@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > > >
> > > > > > > > > > > > > There is no doubt that GPU resource management
> > support
> > > > will
> > > > > > > > greatly
> > > > > > > > > > > > > facilitate the development of AI-related applications
> > > by
> > > > > > > PyFlink
> > > > > > > > > > users.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding the names of several GPU configurations, I
> > > > think
> > > > > it
> > > > > > > is
> > > > > > > > > > better
> > > > > > > > > > > > to
> > > > > > > > > > > > > delete the resource field makes it consistent with
> > the
> > > > > names
> > > > > > of
> > > > > > > > other
> > > > > > > > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > > > > > > > >
> > > > > > > > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path
> > ->
> > > > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三
> > > > > 上午10:39写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Actually, Yangze, Yang and I also had an offline
> > > > > discussion
> > > > > > > > about
> > > > > > > > > > > > making
> > > > > > > > > > > > > > the "GPU Support" as some general "Extended
> > Resource
> > > > > > > Support".
> > > > > > > > We
> > > > > > > > > > > > believe
> > > > > > > > > > > > > > supporting extended resources in a general
> > mechanism
> > > is
> > > > > > > > definitely
> > > > > > > > > > a
> > > > > > > > > > > > good
> > > > > > > > > > > > > > and extensible way. The reason we propose this FLIP
> > > > > > narrowing
> > > > > > > > its
> > > > > > > > > > scope
> > > > > > > > > > > > > > down to GPU alone, is mainly for the concern on
> > extra
> > > > > > efforts
> > > > > > > > and
> > > > > > > > > > > > review
> > > > > > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > To come up with a well design on a general extended
> > > > > > resource
> > > > > > > > > > management
> > > > > > > > > > > > > > mechanism, we would need to investigate more on how
> > > > > people
> > > > > > > use
> > > > > > > > > > > > different
> > > > > > > > > > > > > > kind of resources in practice. For GPU, we learnt
> > > such
> > > > > > > > knowledge
> > > > > > > > > > from
> > > > > > > > > > > > the
> > > > > > > > > > > > > > experts, Becket and his team members. But for FPGA,
> > > or
> > > > > > other
> > > > > > > > > > potential
> > > > > > > > > > > > > > extended resources, we don't have such convenient
> > > > > > information
> > > > > > > > > > sources,
> > > > > > > > > > > > > > making the investigation requires more efforts,
> > > which I
> > > > > > tend
> > > > > > > to
> > > > > > > > > > think
> > > > > > > > > > > > is
> > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On the other hand, we also looked into how Spark
> > > > > supports a
> > > > > > > > general
> > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > Resource Scheduling". Assuming we want to have a
> > > > similar
> > > > > > > > general
> > > > > > > > > > > > extended
> > > > > > > > > > > > > > resource mechanism in the future, we believe that
> > the
> > > > > > current
> > > > > > > > GPU
> > > > > > > > > > > > support
> > > > > > > > > > > > > > design can be easily extended, in an incremental
> > way
> > > > > > without
> > > > > > > > too
> > > > > > > > > > many
> > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - The most important part is probably user
> > > interfaces.
> > > > > > Spark
> > > > > > > > > > offers
> > > > > > > > > > > > > > configuration options to define the amount,
> > discovery
> > > > > > script
> > > > > > > > and
> > > > > > > > > > > > vendor
> > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > k8s) in a per resource type bias [1], which is very
> > > > > similar
> > > > > > > to
> > > > > > > > > > what
> > > > > > > > > > > > we
> > > > > > > > > > > > > > proposed in this FLIP. I think it's not necessary
> > to
> > > > > expose
> > > > > > > > > > config
> > > > > > > > > > > > > > options
> > > > > > > > > > > > > > in the general way atm, since we do not have
> > supports
> > > > for
> > > > > > > other
> > > > > > > > > > > > resource
> > > > > > > > > > > > > > types now. If later we decided to have per resource
> > > > type
> > > > > > > config
> > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > can have backwards compatibility on the current
> > > > proposed
> > > > > > > > options
> > > > > > > > > > > > with
> > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > - For the GPU Manager, if later needed we can
> > change
> > > it
> > > > > to
> > > > > > a
> > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > Resource Manager" (or whatever it is called). That
> > > > should
> > > > > > be
> > > > > > > a
> > > > > > > > > > pure
> > > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > > - For ResourceProfile and ResourceSpec, there are
> > > > already
> > > > > > > > > > fields for
> > > > > > > > > > > > > > general extended resource. We can of course
> > leverage
> > > > them
> > > > > > > when
> > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > fine grained GPU scheduling. That is also not in
> > the
> > > > > scope
> > > > > > of
> > > > > > > > > > this
> > > > > > > > > > > > first
> > > > > > > > > > > > > > step proposal, and would require FLIP-56 to be
> > > finished
> > > > > > > first.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > To summary up, I agree with Becket that have a
> > > separate
> > > > > > FLIP
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > > > > general extended resource mechanism, and keep it in
> > > > mind
> > > > > > when
> > > > > > > > > > > > discussing
> > > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > That's a good point, Stephan. It makes total
> > sense
> > > to
> > > > > > > > generalize
> > > > > > > > > > the
> > > > > > > > > > > > > > > resource management to support custom resources.
> > > > Having
> > > > > > > that
> > > > > > > > > > allows
> > > > > > > > > > > > users
> > > > > > > > > > > > > > > to add new resources by themselves. The general
> > > > > resource
> > > > > > > > > > management
> > > > > > > > > > > > may
> > > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1. The custom resource type definition. It is
> > > > supported
> > > > > > by
> > > > > > > > the
> > > > > > > > > > > > extended
> > > > > > > > > > > > > > > resources in ResourceProfile and ResourceSpec.
> > This
> > > > > will
> > > > > > > > likely
> > > > > > > > > > cover
> > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. The custom resource allocation logic, i.e. how
> > > to
> > > > > > assign
> > > > > > > > the
> > > > > > > > > > > > resources
> > > > > > > > > > > > > > > to different tasks, operators, and so on. This
> > may
> > > > > > require
> > > > > > > > two
> > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > a. Subtask level - make sure the subtasks are put
> > > > into
> > > > > > > > > > suitable
> > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > It is done by the global RM and is not
> > customizable
> > > > > right
> > > > > > > > now.
> > > > > > > > > > > > > > > b. Operator level - map the exact resource to the
> > > > > > operators
> > > > > > > > > > in
> > > > > > > > > > > > TM.
> > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This
> > > step
> > > > > is
> > > > > > > > needed
> > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > the global RM does not distinguish individual
> > > > resources
> > > > > > of
> > > > > > > > the
> > > > > > > > > > same
> > > > > > > > > > > > type.
> > > > > > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The GPU manager is designed to do 2.b here. So it
> > > > > should
> > > > > > > > > > discover the
> > > > > > > > > > > > > > > physical GPU information and bind/match them to
> > > each
> > > > > > > > operators.
> > > > > > > > > > > > Making
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > general will fill in the missing piece to support
> > > > > custom
> > > > > > > > resource
> > > > > > > > > > > > type
> > > > > > > > > > > > > > > definition. But I'd avoid calling it a "External
> > > > > Resource
> > > > > > > > > > Manager" to
> > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > confusion with RM, maybe something like "Operator
> > > > > > Resource
> > > > > > > > > > Assigner"
> > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > be more accurate. So for each resource type users
> > > can
> > > > > > have
> > > > > > > an
> > > > > > > > > > > > optional
> > > > > > > > > > > > > > > "Operator Resource Assigner" in the TM. For
> > memory,
> > > > > users
> > > > > > > > don't
> > > > > > > > > > need
> > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > but for other extended resources, users may need
> > > > that.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Personally I think a pluggable "Operator Resource
> > > > > > Assigner"
> > > > > > > > is
> > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > in this FLIP. But I am also OK with having that
> > in
> > > a
> > > > > > > separate
> > > > > > > > > > FLIP
> > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > the interface between the "Operator Resource
> > > > Assigner"
> > > > > > and
> > > > > > > > > > operator
> > > > > > > > > > > > may
> > > > > > > > > > > > > > > take a while to settle down if we want to make it
> > > > > > generic.
> > > > > > > > But I
> > > > > > > > > > > > think
> > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > implementation should take this future work into
> > > > > > > > consideration so
> > > > > > > > > > > > that we
> > > > > > > > > > > > > > > don't need to break backwards compatibility once
> > we
> > > > > have
> > > > > > > > that.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > > > > > > > sewen@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I cannot really give much input into the
> > > mechanics
> > > > of
> > > > > > > > GPU-aware
> > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > and GPU allocation, as I have no experience
> > with
> > > > > that.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > One thought I had when reading the proposal is
> > if
> > > > it
> > > > > > > makes
> > > > > > > > > > sense to
> > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > the "GPU Manager" as an "External Resource
> > > > Manager",
> > > > > > and
> > > > > > > > GPU
> > > > > > > > > > is one
> > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > The way I understand the ResourceProfile and
> > > > > > > ResourceSpec,
> > > > > > > > > > that is
> > > > > > > > > > > > how
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > It has the advantage that it looks more
> > > extensible.
> > > > > > Maybe
> > > > > > > > > > there is
> > > > > > > > > > > > a
> > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > Resource, a specialized NVIDIA GPU Resource,
> > and
> > > > FPGA
> > > > > > > > > > Resource, a
> > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource
> > > > management
> > > > > > > > support
> > > > > > > > > > is a
> > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > for machine learning use cases. Actually it
> > is
> > > > one
> > > > > of
> > > > > > > the
> > > > > > > > > > mostly
> > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > question from the users who are interested in
> > > > using
> > > > > > > Flink
> > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > > > > > > > 1. The WebUI / REST API should probably also
> > be
> > > > > > > > mentioned in
> > > > > > > > > > the
> > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > 2. Is the data structure that holds GPU info
> > > > also a
> > > > > > > > public
> > > > > > > > > > API?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song
> > <
> > > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for drafting the FLIP and kicking
> > off
> > > > the
> > > > > > > > > > discussion,
> > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Big +1 for this feature. Supporting using
> > of
> > > > GPU
> > > > > in
> > > > > > > > Flink
> > > > > > > > > > is
> > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and it
> > looks
> > > > good
> > > > > > to
> > > > > > > > me. I
> > > > > > > > > > > > think
> > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > very good first step for Flink's GPU
> > > supports.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo
> > <
> > > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > We would like to start a discussion
> > thread
> > > on
> > > > > > > > "FLIP-108:
> > > > > > > > > > Add
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > This FLIP mainly discusses the following
> > > > > issues:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > - Enable user to configure how many GPUs
> > > in a
> > > > > > task
> > > > > > > > > > executor
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > forward such requirements to the external
> > > > > > resource
> > > > > > > > > > managers
> > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > > > > > - Provide information of available GPU
> > > > > resources
> > > > > > to
> > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Key changes proposed in the FLIP are as
> > > > > follows:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > - Forward GPU resource requirements to
> > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > - Introduce GPUManager as one of the task
> > > > > manager
> > > > > > > > > > services to
> > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > and expose GPU resource information to
> > the
> > > > > > context
> > > > > > > of
> > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > - Introduce the default script for GPU
> > > > > discovery,
> > > > > > > in
> > > > > > > > > > which we
> > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > the privilege mode to help user to
> > achieve
> > > > > > > > worker-level
> > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Please find more details in the FLIP wiki
> > > > > > document
> > > > > > > > [1].
> > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >


Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Stephan Ewen <se...@apache.org>.
This sounds good to go ahead from my side.

I like the approach that Becket suggested - in that case the core
abstraction that everyone would need to understand would be "external
resource allocation" and the "ResourceInfoProvider", and the GPU specific
code would be a specific implementation only known to that component that
allocates the external resource. That fits the separation of concerns well.

I also understand that it should not be over-engineered in the first
version, so some simplification makes sense, and then gradually expand from
there.

So +1 to go ahead with what was suggested above (Xintong / Becket) from my
side.

On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <to...@gmail.com> wrote:

> Thanks for the comments, Stephan & Becket.
>
> @Stephan
>
> I see your concern, and I completely agree with you that we should first
> think about the "library" / "plugin" / "extension" style if possible.
>
> If GPUs are sliced and assigned during scheduling, there may be reason,
> > although it looks that it would belong to the slot then. Is that what we
> > are doing here?
>
>
> In the current proposal, we do not have the GPUs sliced and assigned to
> slots, because it could be problematic without dynamic slot allocation.
> E.g., the number of GPUs might not be evenly divisible by the number of
> slots.
>
> I think it makes sense to eventually have the GPUs assigned to slots. Even
> then, we might still need a TM level GPUManager (or ResourceProvider like
> Becket suggested). For memory, in each slot we can simply request the
> amount of memory, leaving it to JVM / OS to decide which memory (address)
> should be assigned. For GPU, and potentially other resources like FPGA, we
> need to explicitly specify which GPU (index) should be used. Therefore, we
> need some component at the TM level to coordinate which slot uses which
> GPU.
>
> IMO, unless we say Flink will not support slot-level GPU slicing at least
> in the foreseeable future, I don't see a good way to avoid touching the TM
> core. To that end, I think Becket's suggestion points to a good direction,
> that supports more features (GPU, FPGA, etc.) with less coupling to the TM
> core (only needs to understand the general interfaces). The detailed
> implementation for specific resource types can even be encapsulated as a
> library.
>
> @Becket
>
> Thanks for sharing your thought on the final state. Despite the details how
> the interfaces should look like, I think this is a really good abstraction
> for supporting general resource types.
>
> I'd like to further clarify that, the following three things are all that
> the "Flink core" needs to understand.
>
>    - The *amount* of resource, for scheduling. Actually, we already have
>    the Resource class in ResourceProfile and ResourceSpec for extended
>    resource. It's just not really used.
>    - The *info*, that Flink provides to the operators / user codes.
>    - The *provider*, which generates the info based on the amount.
>
> The "core" does not need to understand the specific implementation details
> of the above three. They can even be implemented in a 3rd-party library.
> Similar to how we allow users to define their custom MetricReporter.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <be...@gmail.com> wrote:
>
> > Thanks for the comment, Stephan.
> >
> >   - If everything becomes a "core feature", it will make the project hard
> > > to develop in the future. Thinking "library" / "plugin" / "extension"
> > style
> > > where possible helps.
> >
> >
> > Completely agree. It is much more important to design a mechanism than
> > focusing on a specific case. Here is what I am thinking to fully support
> > custom resource management:
> > 1. On the JM / RM side, use ResourceProfile and ResourceSpec to define
> the
> > resource and the amount required. They will be used to find suitable TMs
> > slots to run the tasks. At this point, the resources are only measured by
> > amount, i.e. they do not have individual ID.
> >
> > 2. On the TM side, have something like *"ResourceInfoProvider"* to
> identify
> > and provides the detail information of the individual resource, e.g. GPU
> > ID.. It is important because the operator may have to explicitly interact
> > with the physical resource it uses. The ResourceInfoProvider might look
> > like something below.
> > interface ResourceInfoProvider<INFO> {
> >     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId opId,
> > ResourceProfile resourceProfile);
> > }
> >
> > - There could be several "*ResourceInfoProvider*" configured on the TM to
> > retrieve the information for different resources.
> > - The TM will be responsible to assign those individual resources to each
> > operator according to their requested amount.
> > - The operators will be able to get the ResourceInfo from their
> > RuntimeContext.
> >
> > If we agree this is a reasonable final state. We can adapt the current
> FLIP
> > to it. In fact it does not sound a big change to me. All the proposed
> > configuration can be as is, it is just that Flink itself won't care about
> > them, instead a GPUInfoProviver implementing the ResourceInfoProvider
> will
> > use them.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <se...@apache.org> wrote:
> >
> > > Hi all!
> > >
> > > The main point I wanted to throw into the discussion is the following:
> > >   - With more and more use cases, more and more tools go into Flink
> > >   - If everything becomes a "core feature", it will make the project
> hard
> > > to develop in the future. Thinking "library" / "plugin" / "extension"
> > style
> > > where possible helps.
> > >
> > >   - A good thought experiment is always: How many future developers
> have
> > to
> > > interact with this code (and possibly understand it partially), even if
> > the
> > > features they touch have nothing to do with GPU support. If many
> > > contributors to unrelated features will have to touch it and understand
> > it,
> > > then let's think if there is a different solution. Maybe there is not,
> > but
> > > then we should be sure why.
> > >
> > >   - That led me to raising this issue: If the GPU manager becomes a
> core
> > > service in the TaskManager, Environment, RuntimeContext, etc. then
> > everyone
> > > developing TM and streaming tasks need to understand the GPU manager.
> > That
> > > seems oddly specific, is my impression.
> > >
> > > Access to configuration seems not the right reason to do that. We
> should
> > > expose the Flink configuration from the RuntimeContext anyways.
> > >
> > > If GPUs are sliced and assigned during scheduling, there may be reason,
> > > although it looks that it would belong to the slot then. Is that what
> we
> > > are doing here?
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > >
> > > >  Thanks for the feedback, Becket.
> > > >
> > > > IMO, eventually an operator should only see info of GPUs that are
> > > dedicated
> > > > for it, instead of all GPUs on the machine/container in the current
> > > design.
> > > > It does not make sense to let the user who writes a UDF to worry
> about
> > > > coordination among multiple operators running on the same machine.
> And
> > if
> > > > we want to limit the GPU info an operator sees, we should not let the
> > > > operator to instantiate GPUManager, which means we have to expose
> > > something
> > > > through runtime context, either GPU info or some kind of limited
> access
> > > to
> > > > the GPUManager.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > > > It probably make sense for us to first agree on the final state.
> More
> > > > > specifically, will the resource info be exposed through runtime
> > context
> > > > > eventually?
> > > > >
> > > > > If that is the final state and we have a seamless migration story
> > from
> > > > this
> > > > > FLIP to that final state, Personally I think it is OK to expose the
> > GPU
> > > > > info in the runtime context.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> tonysong820@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > @Yangze,
> > > > > > I think what Stephan means (@Stephan, please correct me if I'm
> > wrong)
> > > > is
> > > > > > that, we might not need to hold and maintain the GPUManager as a
> > > > service
> > > > > in
> > > > > > TaskManagerServices or RuntimeContext. An alternative is to
> create
> > /
> > > > > > retrieve the GPUManager only in the operators that need it, e.g.,
> > > with
> > > > a
> > > > > > static method `GPUManager.get()`.
> > > > > >
> > > > > > @Stephan,
> > > > > > I agree with you on excluding GPUManager from
> TaskManagerServices.
> > > > > >
> > > > > >    - For the first step, where we provide unified TM-level GPU
> > > > > information
> > > > > >    to all operators, it should be fine to have operators access /
> > > > > >    lazy-initiate GPUManager by themselves.
> > > > > >    - In future, we might have some more fine-grained GPU
> > management,
> > > > > where
> > > > > >    we need to maintain GPUManager as a service and put GPU info
> in
> > > slot
> > > > > >    profiles. But at least for now it's not necessary to introduce
> > > such
> > > > > >    complexity.
> > > > > >
> > > > > > However, I have some concerns on excluding GPUManager from
> > > > RuntimeContext
> > > > > > and let operators access it directly.
> > > > > >
> > > > > >    - Configurations needed for creating the GPUManager is not
> > always
> > > > > >    available for operators.
> > > > > >    - If later we want to have fine-grained control over GPU
> (e.g.,
> > > > > >    operators in each slot can only see GPUs reserved for that
> > slot),
> > > > the
> > > > > >    approach cannot be easily extended.
> > > > > >
> > > > > > I would suggest to wrap the GPUManager behind RuntimeContext and
> > only
> > > > > > expose the GPUInfo to users. For now, we can declare a method
> > > > > > `getGPUInfo()` in RuntimeContext, with a default definition that
> > > calls
> > > > > > `GPUManager.get()` to get the lazily-created GPUManager. If later
> > we
> > > > want
> > > > > > to create / retrieve GPUManager in a different way, we can simply
> > > > change
> > > > > > how `getGPUInfo` is implemented, without needing to change any
> > public
> > > > > > interfaces.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <ka...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > @Shephan
> > > > > > > Do you mean Minicluster? Yes, it makes sense to share the GPU
> > > Manager
> > > > > > > in such scenario.
> > > > > > > If that's what you worry about, I'm +1 for holding
> > > > > > > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > > > > > > TaskManagerServices.
> > > > > > >
> > > > > > > Regarding the RuntimeContext/FunctionContext, it just holds the
> > GPU
> > > > > > > info instead of the GPU Manager. AFAIK, it's the only place we
> > > could
> > > > > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> > > isaac@paddlesoft.net
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org
> wrote
> > > > ----
> > > > > > > >
> > > > > > > > > > Can we somehow keep this out of the TaskManager services
> > > > > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > > > > ExternalServicesManagers in future) is conceptually one of
> > the
> > > > task
> > > > > > > > > manager services, just like MemoryManager before 1.10.
> > > > > > > > > - It maintains/holds the GPU resource at TM level and all
> of
> > > the
> > > > > > > > > operators allocate the GPU resources from it. So, it should
> > be
> > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > - We could add a collection called ExternalResourceManagers
> > to
> > > > hold
> > > > > > > > > all managers of other external resources in the future.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Can you help me understand why this needs the addition in
> > > > > > > TaskMagerServices
> > > > > > > > or in the RuntimeContext?
> > > > > > > > Are you worried about the case when multiple Task Executors
> run
> > > in
> > > > > the
> > > > > > > same
> > > > > > > > JVM? That's not common, but wouldn't it actually be good in
> > that
> > > > case
> > > > > > to
> > > > > > > > share the GPU Manager, given that the GPU is shared?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > > > ---------------------------
> > > > > > > >
> > > > > > > >
> > > > > > > > > What parts need information about this?
> > > > > > > > > In this FLIP, operators need the information. Thus, we
> expose
> > > GPU
> > > > > > > > > information to the RuntimeContext/FunctionContext. The slot
> > > > profile
> > > > > > is
> > > > > > > > > not aware of GPU resources as GPU is TM level resource now.
> > > > > > > > >
> > > > > > > > > > Can the GPU Manager be a "self contained" thing that
> simply
> > > > takes
> > > > > > the
> > > > > > > > > configuration, and then abstracts everything internally?
> > > > > > > > > Yes, we just pass the path/args of the discover script and
> > how
> > > > many
> > > > > > > > > GPUs per TM to it. It takes the responsibility to get the
> GPU
> > > > > > > > > information and expose them to the
> > > RuntimeContext/FunctionContext
> > > > > of
> > > > > > > > > Operators. Meanwhile, we'd better not allow operators to
> > > directly
> > > > > > > > > access GPUManager, it should get what they want from
> Context.
> > > We
> > > > > > could
> > > > > > > > > then decouple the interface/implementation of GPUManager
> and
> > > > Public
> > > > > > > > > API.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yangze Guo
> > > > > > > > >
> > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <
> > sewen@apache.org
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > It sounds fine to initially start with GPU specific
> support
> > > and
> > > > > > think
> > > > > > > > > about
> > > > > > > > > > generalizing this once we better understand the space.
> > > > > > > > > >
> > > > > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > > > > - Can we somehow keep this out of the TaskManager
> services?
> > > > > > Anything
> > > > > > > we
> > > > > > > > > > have to pull through all layers of the TM makes the TM
> > > > components
> > > > > > yet
> > > > > > > > > more
> > > > > > > > > > complex and harder to maintain.
> > > > > > > > > >
> > > > > > > > > > - What parts need information about this?
> > > > > > > > > > -> do the slot profiles need information about the GPU?
> > > > > > > > > > -> Can the GPU Manager be a "self contained" thing that
> > > simply
> > > > > > takes
> > > > > > > > > > the configuration, and then abstracts everything
> > internally?
> > > > > > > Operators
> > > > > > > > > can
> > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> > > karmagyz@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > >
> > > > > > > > > > > @Becket
> > > > > > > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add
> > > them
> > > > to
> > > > > > the
> > > > > > > > > > > Public API section.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > Regarding the general extended resource mechanism, I
> > second
> > > > > > > Xintong's
> > > > > > > > > > > suggestion.
> > > > > > > > > > > - It's better to leverage ResourceProfile and
> > ResourceSpec
> > > > > after
> > > > > > we
> > > > > > > > > > > supporting fine-grained GPU scheduling. As a first step
> > > > > > proposal, I
> > > > > > > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > > > > > > - Regarding the "Extended Resource Manager", if I
> > > understand
> > > > > > > > > > > correctly, it just a code refactoring atm, we could
> > extract
> > > > the
> > > > > > > > > > > open/close/allocateExtendResources of GPUManager to
> that
> > > > > > > interface. If
> > > > > > > > > > > that is the case, +1 to do it during implementation.
> > > > > > > > > > >
> > > > > > > > > > > @Xingbo
> > > > > > > > > > > As Xintong said, we looked into how Spark supports a
> > > general
> > > > > > > "Custom
> > > > > > > > > > > Resource Scheduling" before and decided to introduce a
> > > common
> > > > > > > resource
> > > > > > > > > > > configuration
> > > > > > > > > > >
> > > > > >
> schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > to make it more extensible. I think the "resource" is a
> > > > proper
> > > > > > > level
> > > > > > > > > > > to contain all the configs of extended resources.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Yangze Guo
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > > > > hxbks2ks@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > >
> > > > > > > > > > > > There is no doubt that GPU resource management
> support
> > > will
> > > > > > > greatly
> > > > > > > > > > > > facilitate the development of AI-related applications
> > by
> > > > > > PyFlink
> > > > > > > > > users.
> > > > > > > > > > > >
> > > > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > > > >
> > > > > > > > > > > > Regarding the names of several GPU configurations, I
> > > think
> > > > it
> > > > > > is
> > > > > > > > > better
> > > > > > > > > > > to
> > > > > > > > > > > > delete the resource field makes it consistent with
> the
> > > > names
> > > > > of
> > > > > > > other
> > > > > > > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > > > > > > >
> > > > > > > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path
> ->
> > > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > >
> > > > > > > > > > > > Xingbo
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三
> > > > 上午10:39写道:
> > > > > > > > > > > >
> > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Actually, Yangze, Yang and I also had an offline
> > > > discussion
> > > > > > > about
> > > > > > > > > > > making
> > > > > > > > > > > > > the "GPU Support" as some general "Extended
> Resource
> > > > > > Support".
> > > > > > > We
> > > > > > > > > > > believe
> > > > > > > > > > > > > supporting extended resources in a general
> mechanism
> > is
> > > > > > > definitely
> > > > > > > > > a
> > > > > > > > > > > good
> > > > > > > > > > > > > and extensible way. The reason we propose this FLIP
> > > > > narrowing
> > > > > > > its
> > > > > > > > > scope
> > > > > > > > > > > > > down to GPU alone, is mainly for the concern on
> extra
> > > > > efforts
> > > > > > > and
> > > > > > > > > > > review
> > > > > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > > > > >
> > > > > > > > > > > > > To come up with a well design on a general extended
> > > > > resource
> > > > > > > > > management
> > > > > > > > > > > > > mechanism, we would need to investigate more on how
> > > > people
> > > > > > use
> > > > > > > > > > > different
> > > > > > > > > > > > > kind of resources in practice. For GPU, we learnt
> > such
> > > > > > > knowledge
> > > > > > > > > from
> > > > > > > > > > > the
> > > > > > > > > > > > > experts, Becket and his team members. But for FPGA,
> > or
> > > > > other
> > > > > > > > > potential
> > > > > > > > > > > > > extended resources, we don't have such convenient
> > > > > information
> > > > > > > > > sources,
> > > > > > > > > > > > > making the investigation requires more efforts,
> > which I
> > > > > tend
> > > > > > to
> > > > > > > > > think
> > > > > > > > > > > is
> > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On the other hand, we also looked into how Spark
> > > > supports a
> > > > > > > general
> > > > > > > > > > > "Custom
> > > > > > > > > > > > > Resource Scheduling". Assuming we want to have a
> > > similar
> > > > > > > general
> > > > > > > > > > > extended
> > > > > > > > > > > > > resource mechanism in the future, we believe that
> the
> > > > > current
> > > > > > > GPU
> > > > > > > > > > > support
> > > > > > > > > > > > > design can be easily extended, in an incremental
> way
> > > > > without
> > > > > > > too
> > > > > > > > > many
> > > > > > > > > > > > > reworks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > - The most important part is probably user
> > interfaces.
> > > > > Spark
> > > > > > > > > offers
> > > > > > > > > > > > > configuration options to define the amount,
> discovery
> > > > > script
> > > > > > > and
> > > > > > > > > > > vendor
> > > > > > > > > > > > > (on
> > > > > > > > > > > > > k8s) in a per resource type bias [1], which is very
> > > > similar
> > > > > > to
> > > > > > > > > what
> > > > > > > > > > > we
> > > > > > > > > > > > > proposed in this FLIP. I think it's not necessary
> to
> > > > expose
> > > > > > > > > config
> > > > > > > > > > > > > options
> > > > > > > > > > > > > in the general way atm, since we do not have
> supports
> > > for
> > > > > > other
> > > > > > > > > > > resource
> > > > > > > > > > > > > types now. If later we decided to have per resource
> > > type
> > > > > > config
> > > > > > > > > > > > > options, we
> > > > > > > > > > > > > can have backwards compatibility on the current
> > > proposed
> > > > > > > options
> > > > > > > > > > > with
> > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > - For the GPU Manager, if later needed we can
> change
> > it
> > > > to
> > > > > a
> > > > > > > > > > > "Extended
> > > > > > > > > > > > > Resource Manager" (or whatever it is called). That
> > > should
> > > > > be
> > > > > > a
> > > > > > > > > pure
> > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > - For ResourceProfile and ResourceSpec, there are
> > > already
> > > > > > > > > fields for
> > > > > > > > > > > > > general extended resource. We can of course
> leverage
> > > them
> > > > > > when
> > > > > > > > > > > > > supporting
> > > > > > > > > > > > > fine grained GPU scheduling. That is also not in
> the
> > > > scope
> > > > > of
> > > > > > > > > this
> > > > > > > > > > > first
> > > > > > > > > > > > > step proposal, and would require FLIP-56 to be
> > finished
> > > > > > first.
> > > > > > > > > > > > >
> > > > > > > > > > > > > To summary up, I agree with Becket that have a
> > separate
> > > > > FLIP
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > > > > general extended resource mechanism, and keep it in
> > > mind
> > > > > when
> > > > > > > > > > > discussing
> > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > That's a good point, Stephan. It makes total
> sense
> > to
> > > > > > > generalize
> > > > > > > > > the
> > > > > > > > > > > > > > resource management to support custom resources.
> > > Having
> > > > > > that
> > > > > > > > > allows
> > > > > > > > > > > users
> > > > > > > > > > > > > > to add new resources by themselves. The general
> > > > resource
> > > > > > > > > management
> > > > > > > > > > > may
> > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. The custom resource type definition. It is
> > > supported
> > > > > by
> > > > > > > the
> > > > > > > > > > > extended
> > > > > > > > > > > > > > resources in ResourceProfile and ResourceSpec.
> This
> > > > will
> > > > > > > likely
> > > > > > > > > cover
> > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. The custom resource allocation logic, i.e. how
> > to
> > > > > assign
> > > > > > > the
> > > > > > > > > > > resources
> > > > > > > > > > > > > > to different tasks, operators, and so on. This
> may
> > > > > require
> > > > > > > two
> > > > > > > > > > > levels /
> > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > a. Subtask level - make sure the subtasks are put
> > > into
> > > > > > > > > suitable
> > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > It is done by the global RM and is not
> customizable
> > > > right
> > > > > > > now.
> > > > > > > > > > > > > > b. Operator level - map the exact resource to the
> > > > > operators
> > > > > > > > > in
> > > > > > > > > > > TM.
> > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This
> > step
> > > > is
> > > > > > > needed
> > > > > > > > > > > assuming
> > > > > > > > > > > > > > the global RM does not distinguish individual
> > > resources
> > > > > of
> > > > > > > the
> > > > > > > > > same
> > > > > > > > > > > type.
> > > > > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The GPU manager is designed to do 2.b here. So it
> > > > should
> > > > > > > > > discover the
> > > > > > > > > > > > > > physical GPU information and bind/match them to
> > each
> > > > > > > operators.
> > > > > > > > > > > Making
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > general will fill in the missing piece to support
> > > > custom
> > > > > > > resource
> > > > > > > > > > > type
> > > > > > > > > > > > > > definition. But I'd avoid calling it a "External
> > > > Resource
> > > > > > > > > Manager" to
> > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > confusion with RM, maybe something like "Operator
> > > > > Resource
> > > > > > > > > Assigner"
> > > > > > > > > > > > > would
> > > > > > > > > > > > > > be more accurate. So for each resource type users
> > can
> > > > > have
> > > > > > an
> > > > > > > > > > > optional
> > > > > > > > > > > > > > "Operator Resource Assigner" in the TM. For
> memory,
> > > > users
> > > > > > > don't
> > > > > > > > > need
> > > > > > > > > > > > > this,
> > > > > > > > > > > > > > but for other extended resources, users may need
> > > that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Personally I think a pluggable "Operator Resource
> > > > > Assigner"
> > > > > > > is
> > > > > > > > > > > achievable
> > > > > > > > > > > > > > in this FLIP. But I am also OK with having that
> in
> > a
> > > > > > separate
> > > > > > > > > FLIP
> > > > > > > > > > > > > because
> > > > > > > > > > > > > > the interface between the "Operator Resource
> > > Assigner"
> > > > > and
> > > > > > > > > operator
> > > > > > > > > > > may
> > > > > > > > > > > > > > take a while to settle down if we want to make it
> > > > > generic.
> > > > > > > But I
> > > > > > > > > > > think
> > > > > > > > > > > > > our
> > > > > > > > > > > > > > implementation should take this future work into
> > > > > > > consideration so
> > > > > > > > > > > that we
> > > > > > > > > > > > > > don't need to break backwards compatibility once
> we
> > > > have
> > > > > > > that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > > > > > > sewen@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I cannot really give much input into the
> > mechanics
> > > of
> > > > > > > GPU-aware
> > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > and GPU allocation, as I have no experience
> with
> > > > that.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > One thought I had when reading the proposal is
> if
> > > it
> > > > > > makes
> > > > > > > > > sense to
> > > > > > > > > > > > > look
> > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > the "GPU Manager" as an "External Resource
> > > Manager",
> > > > > and
> > > > > > > GPU
> > > > > > > > > is one
> > > > > > > > > > > > > such
> > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > The way I understand the ResourceProfile and
> > > > > > ResourceSpec,
> > > > > > > > > that is
> > > > > > > > > > > how
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > It has the advantage that it looks more
> > extensible.
> > > > > Maybe
> > > > > > > > > there is
> > > > > > > > > > > a
> > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > Resource, a specialized NVIDIA GPU Resource,
> and
> > > FPGA
> > > > > > > > > Resource, a
> > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource
> > > management
> > > > > > > support
> > > > > > > > > is a
> > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > for machine learning use cases. Actually it
> is
> > > one
> > > > of
> > > > > > the
> > > > > > > > > mostly
> > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > question from the users who are interested in
> > > using
> > > > > > Flink
> > > > > > > > > for ML.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > > > > > > 1. The WebUI / REST API should probably also
> be
> > > > > > > mentioned in
> > > > > > > > > the
> > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > 2. Is the data structure that holds GPU info
> > > also a
> > > > > > > public
> > > > > > > > > API?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song
> <
> > > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for drafting the FLIP and kicking
> off
> > > the
> > > > > > > > > discussion,
> > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Big +1 for this feature. Supporting using
> of
> > > GPU
> > > > in
> > > > > > > Flink
> > > > > > > > > is
> > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and it
> looks
> > > good
> > > > > to
> > > > > > > me. I
> > > > > > > > > > > think
> > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > very good first step for Flink's GPU
> > supports.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo
> <
> > > > > > > > > karmagyz@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > We would like to start a discussion
> thread
> > on
> > > > > > > "FLIP-108:
> > > > > > > > > Add
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > This FLIP mainly discusses the following
> > > > issues:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - Enable user to configure how many GPUs
> > in a
> > > > > task
> > > > > > > > > executor
> > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > forward such requirements to the external
> > > > > resource
> > > > > > > > > managers
> > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > > > > - Provide information of available GPU
> > > > resources
> > > > > to
> > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Key changes proposed in the FLIP are as
> > > > follows:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - Forward GPU resource requirements to
> > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > - Introduce GPUManager as one of the task
> > > > manager
> > > > > > > > > services to
> > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > and expose GPU resource information to
> the
> > > > > context
> > > > > > of
> > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > - Introduce the default script for GPU
> > > > discovery,
> > > > > > in
> > > > > > > > > which we
> > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > the privilege mode to help user to
> achieve
> > > > > > > worker-level
> > > > > > > > > > > isolation
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Please find more details in the FLIP wiki
> > > > > document
> > > > > > > [1].
> > > > > > > > > > > Looking
> > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Xintong Song <to...@gmail.com>.
Thanks for the comments, Stephan & Becket.

@Stephan

I see your concern, and I completely agree with you that we should first
think about the "library" / "plugin" / "extension" style if possible.

If GPUs are sliced and assigned during scheduling, there may be reason,
> although it looks that it would belong to the slot then. Is that what we
> are doing here?


In the current proposal, we do not have the GPUs sliced and assigned to
slots, because it could be problematic without dynamic slot allocation.
E.g., the number of GPUs might not be evenly divisible by the number of
slots.

I think it makes sense to eventually have the GPUs assigned to slots. Even
then, we might still need a TM level GPUManager (or ResourceProvider like
Becket suggested). For memory, in each slot we can simply request the
amount of memory, leaving it to JVM / OS to decide which memory (address)
should be assigned. For GPU, and potentially other resources like FPGA, we
need to explicitly specify which GPU (index) should be used. Therefore, we
need some component at the TM level to coordinate which slot uses which
GPU.

IMO, unless we say Flink will not support slot-level GPU slicing at least
in the foreseeable future, I don't see a good way to avoid touching the TM
core. To that end, I think Becket's suggestion points to a good direction,
that supports more features (GPU, FPGA, etc.) with less coupling to the TM
core (only needs to understand the general interfaces). The detailed
implementation for specific resource types can even be encapsulated as a
library.

@Becket

Thanks for sharing your thought on the final state. Despite the details how
the interfaces should look like, I think this is a really good abstraction
for supporting general resource types.

I'd like to further clarify that, the following three things are all that
the "Flink core" needs to understand.

   - The *amount* of resource, for scheduling. Actually, we already have
   the Resource class in ResourceProfile and ResourceSpec for extended
   resource. It's just not really used.
   - The *info*, that Flink provides to the operators / user codes.
   - The *provider*, which generates the info based on the amount.

The "core" does not need to understand the specific implementation details
of the above three. They can even be implemented in a 3rd-party library.
Similar to how we allow users to define their custom MetricReporter.

Thank you~

Xintong Song



On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <be...@gmail.com> wrote:

> Thanks for the comment, Stephan.
>
>   - If everything becomes a "core feature", it will make the project hard
> > to develop in the future. Thinking "library" / "plugin" / "extension"
> style
> > where possible helps.
>
>
> Completely agree. It is much more important to design a mechanism than
> focusing on a specific case. Here is what I am thinking to fully support
> custom resource management:
> 1. On the JM / RM side, use ResourceProfile and ResourceSpec to define the
> resource and the amount required. They will be used to find suitable TMs
> slots to run the tasks. At this point, the resources are only measured by
> amount, i.e. they do not have individual ID.
>
> 2. On the TM side, have something like *"ResourceInfoProvider"* to identify
> and provides the detail information of the individual resource, e.g. GPU
> ID.. It is important because the operator may have to explicitly interact
> with the physical resource it uses. The ResourceInfoProvider might look
> like something below.
> interface ResourceInfoProvider<INFO> {
>     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId opId,
> ResourceProfile resourceProfile);
> }
>
> - There could be several "*ResourceInfoProvider*" configured on the TM to
> retrieve the information for different resources.
> - The TM will be responsible to assign those individual resources to each
> operator according to their requested amount.
> - The operators will be able to get the ResourceInfo from their
> RuntimeContext.
>
> If we agree this is a reasonable final state. We can adapt the current FLIP
> to it. In fact it does not sound a big change to me. All the proposed
> configuration can be as is, it is just that Flink itself won't care about
> them, instead a GPUInfoProviver implementing the ResourceInfoProvider will
> use them.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <se...@apache.org> wrote:
>
> > Hi all!
> >
> > The main point I wanted to throw into the discussion is the following:
> >   - With more and more use cases, more and more tools go into Flink
> >   - If everything becomes a "core feature", it will make the project hard
> > to develop in the future. Thinking "library" / "plugin" / "extension"
> style
> > where possible helps.
> >
> >   - A good thought experiment is always: How many future developers have
> to
> > interact with this code (and possibly understand it partially), even if
> the
> > features they touch have nothing to do with GPU support. If many
> > contributors to unrelated features will have to touch it and understand
> it,
> > then let's think if there is a different solution. Maybe there is not,
> but
> > then we should be sure why.
> >
> >   - That led me to raising this issue: If the GPU manager becomes a core
> > service in the TaskManager, Environment, RuntimeContext, etc. then
> everyone
> > developing TM and streaming tasks need to understand the GPU manager.
> That
> > seems oddly specific, is my impression.
> >
> > Access to configuration seems not the right reason to do that. We should
> > expose the Flink configuration from the RuntimeContext anyways.
> >
> > If GPUs are sliced and assigned during scheduling, there may be reason,
> > although it looks that it would belong to the slot then. Is that what we
> > are doing here?
> >
> > Best,
> > Stephan
> >
> >
> > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <to...@gmail.com>
> > wrote:
> >
> > >  Thanks for the feedback, Becket.
> > >
> > > IMO, eventually an operator should only see info of GPUs that are
> > dedicated
> > > for it, instead of all GPUs on the machine/container in the current
> > design.
> > > It does not make sense to let the user who writes a UDF to worry about
> > > coordination among multiple operators running on the same machine. And
> if
> > > we want to limit the GPU info an operator sees, we should not let the
> > > operator to instantiate GPUManager, which means we have to expose
> > something
> > > through runtime context, either GPU info or some kind of limited access
> > to
> > > the GPUManager.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > > > It probably make sense for us to first agree on the final state. More
> > > > specifically, will the resource info be exposed through runtime
> context
> > > > eventually?
> > > >
> > > > If that is the final state and we have a seamless migration story
> from
> > > this
> > > > FLIP to that final state, Personally I think it is OK to expose the
> GPU
> > > > info in the runtime context.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <tonysong820@gmail.com
> >
> > > > wrote:
> > > >
> > > > > @Yangze,
> > > > > I think what Stephan means (@Stephan, please correct me if I'm
> wrong)
> > > is
> > > > > that, we might not need to hold and maintain the GPUManager as a
> > > service
> > > > in
> > > > > TaskManagerServices or RuntimeContext. An alternative is to create
> /
> > > > > retrieve the GPUManager only in the operators that need it, e.g.,
> > with
> > > a
> > > > > static method `GPUManager.get()`.
> > > > >
> > > > > @Stephan,
> > > > > I agree with you on excluding GPUManager from TaskManagerServices.
> > > > >
> > > > >    - For the first step, where we provide unified TM-level GPU
> > > > information
> > > > >    to all operators, it should be fine to have operators access /
> > > > >    lazy-initiate GPUManager by themselves.
> > > > >    - In future, we might have some more fine-grained GPU
> management,
> > > > where
> > > > >    we need to maintain GPUManager as a service and put GPU info in
> > slot
> > > > >    profiles. But at least for now it's not necessary to introduce
> > such
> > > > >    complexity.
> > > > >
> > > > > However, I have some concerns on excluding GPUManager from
> > > RuntimeContext
> > > > > and let operators access it directly.
> > > > >
> > > > >    - Configurations needed for creating the GPUManager is not
> always
> > > > >    available for operators.
> > > > >    - If later we want to have fine-grained control over GPU (e.g.,
> > > > >    operators in each slot can only see GPUs reserved for that
> slot),
> > > the
> > > > >    approach cannot be easily extended.
> > > > >
> > > > > I would suggest to wrap the GPUManager behind RuntimeContext and
> only
> > > > > expose the GPUInfo to users. For now, we can declare a method
> > > > > `getGPUInfo()` in RuntimeContext, with a default definition that
> > calls
> > > > > `GPUManager.get()` to get the lazily-created GPUManager. If later
> we
> > > want
> > > > > to create / retrieve GPUManager in a different way, we can simply
> > > change
> > > > > how `getGPUInfo` is implemented, without needing to change any
> public
> > > > > interfaces.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <ka...@gmail.com>
> > > wrote:
> > > > >
> > > > > > @Shephan
> > > > > > Do you mean Minicluster? Yes, it makes sense to share the GPU
> > Manager
> > > > > > in such scenario.
> > > > > > If that's what you worry about, I'm +1 for holding
> > > > > > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > > > > > TaskManagerServices.
> > > > > >
> > > > > > Regarding the RuntimeContext/FunctionContext, it just holds the
> GPU
> > > > > > info instead of the GPU Manager. AFAIK, it's the only place we
> > could
> > > > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> > isaac@paddlesoft.net
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org wrote
> > > ----
> > > > > > >
> > > > > > > > > Can we somehow keep this out of the TaskManager services
> > > > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > > > ExternalServicesManagers in future) is conceptually one of
> the
> > > task
> > > > > > > > manager services, just like MemoryManager before 1.10.
> > > > > > > > - It maintains/holds the GPU resource at TM level and all of
> > the
> > > > > > > > operators allocate the GPU resources from it. So, it should
> be
> > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > - We could add a collection called ExternalResourceManagers
> to
> > > hold
> > > > > > > > all managers of other external resources in the future.
> > > > > > > >
> > > > > > >
> > > > > > > Can you help me understand why this needs the addition in
> > > > > > TaskMagerServices
> > > > > > > or in the RuntimeContext?
> > > > > > > Are you worried about the case when multiple Task Executors run
> > in
> > > > the
> > > > > > same
> > > > > > > JVM? That's not common, but wouldn't it actually be good in
> that
> > > case
> > > > > to
> > > > > > > share the GPU Manager, given that the GPU is shared?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Stephan
> > > > > > >
> > > > > > > ---------------------------
> > > > > > >
> > > > > > >
> > > > > > > > What parts need information about this?
> > > > > > > > In this FLIP, operators need the information. Thus, we expose
> > GPU
> > > > > > > > information to the RuntimeContext/FunctionContext. The slot
> > > profile
> > > > > is
> > > > > > > > not aware of GPU resources as GPU is TM level resource now.
> > > > > > > >
> > > > > > > > > Can the GPU Manager be a "self contained" thing that simply
> > > takes
> > > > > the
> > > > > > > > configuration, and then abstracts everything internally?
> > > > > > > > Yes, we just pass the path/args of the discover script and
> how
> > > many
> > > > > > > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > > > > > > information and expose them to the
> > RuntimeContext/FunctionContext
> > > > of
> > > > > > > > Operators. Meanwhile, we'd better not allow operators to
> > directly
> > > > > > > > access GPUManager, it should get what they want from Context.
> > We
> > > > > could
> > > > > > > > then decouple the interface/implementation of GPUManager and
> > > Public
> > > > > > > > API.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <
> sewen@apache.org
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > It sounds fine to initially start with GPU specific support
> > and
> > > > > think
> > > > > > > > about
> > > > > > > > > generalizing this once we better understand the space.
> > > > > > > > >
> > > > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > > > - Can we somehow keep this out of the TaskManager services?
> > > > > Anything
> > > > > > we
> > > > > > > > > have to pull through all layers of the TM makes the TM
> > > components
> > > > > yet
> > > > > > > > more
> > > > > > > > > complex and harder to maintain.
> > > > > > > > >
> > > > > > > > > - What parts need information about this?
> > > > > > > > > -> do the slot profiles need information about the GPU?
> > > > > > > > > -> Can the GPU Manager be a "self contained" thing that
> > simply
> > > > > takes
> > > > > > > > > the configuration, and then abstracts everything
> internally?
> > > > > > Operators
> > > > > > > > can
> > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> > karmagyz@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > >
> > > > > > > > > > @Becket
> > > > > > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add
> > them
> > > to
> > > > > the
> > > > > > > > > > Public API section.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > @Stephan @Becket
> > > > > > > > > > Regarding the general extended resource mechanism, I
> second
> > > > > > Xintong's
> > > > > > > > > > suggestion.
> > > > > > > > > > - It's better to leverage ResourceProfile and
> ResourceSpec
> > > > after
> > > > > we
> > > > > > > > > > supporting fine-grained GPU scheduling. As a first step
> > > > > proposal, I
> > > > > > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > > > > > - Regarding the "Extended Resource Manager", if I
> > understand
> > > > > > > > > > correctly, it just a code refactoring atm, we could
> extract
> > > the
> > > > > > > > > > open/close/allocateExtendResources of GPUManager to that
> > > > > > interface. If
> > > > > > > > > > that is the case, +1 to do it during implementation.
> > > > > > > > > >
> > > > > > > > > > @Xingbo
> > > > > > > > > > As Xintong said, we looked into how Spark supports a
> > general
> > > > > > "Custom
> > > > > > > > > > Resource Scheduling" before and decided to introduce a
> > common
> > > > > > resource
> > > > > > > > > > configuration
> > > > > > > > > >
> > > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > to make it more extensible. I think the "resource" is a
> > > proper
> > > > > > level
> > > > > > > > > > to contain all the configs of extended resources.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > > > hxbks2ks@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > >
> > > > > > > > > > > There is no doubt that GPU resource management support
> > will
> > > > > > greatly
> > > > > > > > > > > facilitate the development of AI-related applications
> by
> > > > > PyFlink
> > > > > > > > users.
> > > > > > > > > > >
> > > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > > >
> > > > > > > > > > > Regarding the names of several GPU configurations, I
> > think
> > > it
> > > > > is
> > > > > > > > better
> > > > > > > > > > to
> > > > > > > > > > > delete the resource field makes it consistent with the
> > > names
> > > > of
> > > > > > other
> > > > > > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > > > > > >
> > > > > > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > >
> > > > > > > > > > > Xingbo
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三
> > > 上午10:39写道:
> > > > > > > > > > >
> > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > >
> > > > > > > > > > > > Actually, Yangze, Yang and I also had an offline
> > > discussion
> > > > > > about
> > > > > > > > > > making
> > > > > > > > > > > > the "GPU Support" as some general "Extended Resource
> > > > > Support".
> > > > > > We
> > > > > > > > > > believe
> > > > > > > > > > > > supporting extended resources in a general mechanism
> is
> > > > > > definitely
> > > > > > > > a
> > > > > > > > > > good
> > > > > > > > > > > > and extensible way. The reason we propose this FLIP
> > > > narrowing
> > > > > > its
> > > > > > > > scope
> > > > > > > > > > > > down to GPU alone, is mainly for the concern on extra
> > > > efforts
> > > > > > and
> > > > > > > > > > review
> > > > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > > > >
> > > > > > > > > > > > To come up with a well design on a general extended
> > > > resource
> > > > > > > > management
> > > > > > > > > > > > mechanism, we would need to investigate more on how
> > > people
> > > > > use
> > > > > > > > > > different
> > > > > > > > > > > > kind of resources in practice. For GPU, we learnt
> such
> > > > > > knowledge
> > > > > > > > from
> > > > > > > > > > the
> > > > > > > > > > > > experts, Becket and his team members. But for FPGA,
> or
> > > > other
> > > > > > > > potential
> > > > > > > > > > > > extended resources, we don't have such convenient
> > > > information
> > > > > > > > sources,
> > > > > > > > > > > > making the investigation requires more efforts,
> which I
> > > > tend
> > > > > to
> > > > > > > > think
> > > > > > > > > > is
> > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > >
> > > > > > > > > > > > On the other hand, we also looked into how Spark
> > > supports a
> > > > > > general
> > > > > > > > > > "Custom
> > > > > > > > > > > > Resource Scheduling". Assuming we want to have a
> > similar
> > > > > > general
> > > > > > > > > > extended
> > > > > > > > > > > > resource mechanism in the future, we believe that the
> > > > current
> > > > > > GPU
> > > > > > > > > > support
> > > > > > > > > > > > design can be easily extended, in an incremental way
> > > > without
> > > > > > too
> > > > > > > > many
> > > > > > > > > > > > reworks.
> > > > > > > > > > > >
> > > > > > > > > > > > - The most important part is probably user
> interfaces.
> > > > Spark
> > > > > > > > offers
> > > > > > > > > > > > configuration options to define the amount, discovery
> > > > script
> > > > > > and
> > > > > > > > > > vendor
> > > > > > > > > > > > (on
> > > > > > > > > > > > k8s) in a per resource type bias [1], which is very
> > > similar
> > > > > to
> > > > > > > > what
> > > > > > > > > > we
> > > > > > > > > > > > proposed in this FLIP. I think it's not necessary to
> > > expose
> > > > > > > > config
> > > > > > > > > > > > options
> > > > > > > > > > > > in the general way atm, since we do not have supports
> > for
> > > > > other
> > > > > > > > > > resource
> > > > > > > > > > > > types now. If later we decided to have per resource
> > type
> > > > > config
> > > > > > > > > > > > options, we
> > > > > > > > > > > > can have backwards compatibility on the current
> > proposed
> > > > > > options
> > > > > > > > > > with
> > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > - For the GPU Manager, if later needed we can change
> it
> > > to
> > > > a
> > > > > > > > > > "Extended
> > > > > > > > > > > > Resource Manager" (or whatever it is called). That
> > should
> > > > be
> > > > > a
> > > > > > > > pure
> > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > - For ResourceProfile and ResourceSpec, there are
> > already
> > > > > > > > fields for
> > > > > > > > > > > > general extended resource. We can of course leverage
> > them
> > > > > when
> > > > > > > > > > > > supporting
> > > > > > > > > > > > fine grained GPU scheduling. That is also not in the
> > > scope
> > > > of
> > > > > > > > this
> > > > > > > > > > first
> > > > > > > > > > > > step proposal, and would require FLIP-56 to be
> finished
> > > > > first.
> > > > > > > > > > > >
> > > > > > > > > > > > To summary up, I agree with Becket that have a
> separate
> > > > FLIP
> > > > > > for
> > > > > > > > the
> > > > > > > > > > > > general extended resource mechanism, and keep it in
> > mind
> > > > when
> > > > > > > > > > discussing
> > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > > > becket.qin@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > That's a good point, Stephan. It makes total sense
> to
> > > > > > generalize
> > > > > > > > the
> > > > > > > > > > > > > resource management to support custom resources.
> > Having
> > > > > that
> > > > > > > > allows
> > > > > > > > > > users
> > > > > > > > > > > > > to add new resources by themselves. The general
> > > resource
> > > > > > > > management
> > > > > > > > > > may
> > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. The custom resource type definition. It is
> > supported
> > > > by
> > > > > > the
> > > > > > > > > > extended
> > > > > > > > > > > > > resources in ResourceProfile and ResourceSpec. This
> > > will
> > > > > > likely
> > > > > > > > cover
> > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. The custom resource allocation logic, i.e. how
> to
> > > > assign
> > > > > > the
> > > > > > > > > > resources
> > > > > > > > > > > > > to different tasks, operators, and so on. This may
> > > > require
> > > > > > two
> > > > > > > > > > levels /
> > > > > > > > > > > > > steps:
> > > > > > > > > > > > > a. Subtask level - make sure the subtasks are put
> > into
> > > > > > > > suitable
> > > > > > > > > > > > slots.
> > > > > > > > > > > > > It is done by the global RM and is not customizable
> > > right
> > > > > > now.
> > > > > > > > > > > > > b. Operator level - map the exact resource to the
> > > > operators
> > > > > > > > in
> > > > > > > > > > TM.
> > > > > > > > > > > > e.g.
> > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This
> step
> > > is
> > > > > > needed
> > > > > > > > > > assuming
> > > > > > > > > > > > > the global RM does not distinguish individual
> > resources
> > > > of
> > > > > > the
> > > > > > > > same
> > > > > > > > > > type.
> > > > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The GPU manager is designed to do 2.b here. So it
> > > should
> > > > > > > > discover the
> > > > > > > > > > > > > physical GPU information and bind/match them to
> each
> > > > > > operators.
> > > > > > > > > > Making
> > > > > > > > > > > > this
> > > > > > > > > > > > > general will fill in the missing piece to support
> > > custom
> > > > > > resource
> > > > > > > > > > type
> > > > > > > > > > > > > definition. But I'd avoid calling it a "External
> > > Resource
> > > > > > > > Manager" to
> > > > > > > > > > > > avoid
> > > > > > > > > > > > > confusion with RM, maybe something like "Operator
> > > > Resource
> > > > > > > > Assigner"
> > > > > > > > > > > > would
> > > > > > > > > > > > > be more accurate. So for each resource type users
> can
> > > > have
> > > > > an
> > > > > > > > > > optional
> > > > > > > > > > > > > "Operator Resource Assigner" in the TM. For memory,
> > > users
> > > > > > don't
> > > > > > > > need
> > > > > > > > > > > > this,
> > > > > > > > > > > > > but for other extended resources, users may need
> > that.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Personally I think a pluggable "Operator Resource
> > > > Assigner"
> > > > > > is
> > > > > > > > > > achievable
> > > > > > > > > > > > > in this FLIP. But I am also OK with having that in
> a
> > > > > separate
> > > > > > > > FLIP
> > > > > > > > > > > > because
> > > > > > > > > > > > > the interface between the "Operator Resource
> > Assigner"
> > > > and
> > > > > > > > operator
> > > > > > > > > > may
> > > > > > > > > > > > > take a while to settle down if we want to make it
> > > > generic.
> > > > > > But I
> > > > > > > > > > think
> > > > > > > > > > > > our
> > > > > > > > > > > > > implementation should take this future work into
> > > > > > consideration so
> > > > > > > > > > that we
> > > > > > > > > > > > > don't need to break backwards compatibility once we
> > > have
> > > > > > that.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > > > > > sewen@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I cannot really give much input into the
> mechanics
> > of
> > > > > > GPU-aware
> > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > and GPU allocation, as I have no experience with
> > > that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > One thought I had when reading the proposal is if
> > it
> > > > > makes
> > > > > > > > sense to
> > > > > > > > > > > > look
> > > > > > > > > > > > > at
> > > > > > > > > > > > > > the "GPU Manager" as an "External Resource
> > Manager",
> > > > and
> > > > > > GPU
> > > > > > > > is one
> > > > > > > > > > > > such
> > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > The way I understand the ResourceProfile and
> > > > > ResourceSpec,
> > > > > > > > that is
> > > > > > > > > > how
> > > > > > > > > > > > it
> > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > It has the advantage that it looks more
> extensible.
> > > > Maybe
> > > > > > > > there is
> > > > > > > > > > a
> > > > > > > > > > > > GPU
> > > > > > > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and
> > FPGA
> > > > > > > > Resource, a
> > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource
> > management
> > > > > > support
> > > > > > > > is a
> > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > for machine learning use cases. Actually it is
> > one
> > > of
> > > > > the
> > > > > > > > mostly
> > > > > > > > > > > > asked
> > > > > > > > > > > > > > > question from the users who are interested in
> > using
> > > > > Flink
> > > > > > > > for ML.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > > > > > 1. The WebUI / REST API should probably also be
> > > > > > mentioned in
> > > > > > > > the
> > > > > > > > > > > > public
> > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > 2. Is the data structure that holds GPU info
> > also a
> > > > > > public
> > > > > > > > API?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for drafting the FLIP and kicking off
> > the
> > > > > > > > discussion,
> > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Big +1 for this feature. Supporting using of
> > GPU
> > > in
> > > > > > Flink
> > > > > > > > is
> > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks
> > good
> > > > to
> > > > > > me. I
> > > > > > > > > > think
> > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > very good first step for Flink's GPU
> supports.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > > > > > > karmagyz@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We would like to start a discussion thread
> on
> > > > > > "FLIP-108:
> > > > > > > > Add
> > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This FLIP mainly discusses the following
> > > issues:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Enable user to configure how many GPUs
> in a
> > > > task
> > > > > > > > executor
> > > > > > > > > > and
> > > > > > > > > > > > > > > > > forward such requirements to the external
> > > > resource
> > > > > > > > managers
> > > > > > > > > > (for
> > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > > > - Provide information of available GPU
> > > resources
> > > > to
> > > > > > > > > > operators.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Key changes proposed in the FLIP are as
> > > follows:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Forward GPU resource requirements to
> > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > - Introduce GPUManager as one of the task
> > > manager
> > > > > > > > services to
> > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > and expose GPU resource information to the
> > > > context
> > > > > of
> > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > - Introduce the default script for GPU
> > > discovery,
> > > > > in
> > > > > > > > which we
> > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > the privilege mode to help user to achieve
> > > > > > worker-level
> > > > > > > > > > isolation
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Please find more details in the FLIP wiki
> > > > document
> > > > > > [1].
> > > > > > > > > > Looking
> > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Becket Qin <be...@gmail.com>.
Thanks for the comment, Stephan.

  - If everything becomes a "core feature", it will make the project hard
> to develop in the future. Thinking "library" / "plugin" / "extension" style
> where possible helps.


Completely agree. It is much more important to design a mechanism than
focusing on a specific case. Here is what I am thinking to fully support
custom resource management:
1. On the JM / RM side, use ResourceProfile and ResourceSpec to define the
resource and the amount required. They will be used to find suitable TMs
slots to run the tasks. At this point, the resources are only measured by
amount, i.e. they do not have individual ID.

2. On the TM side, have something like *"ResourceInfoProvider"* to identify
and provides the detail information of the individual resource, e.g. GPU
ID.. It is important because the operator may have to explicitly interact
with the physical resource it uses. The ResourceInfoProvider might look
like something below.
interface ResourceInfoProvider<INFO> {
    Map<AbstractID, INFO> retrieveResourceInfo(OperatorId opId,
ResourceProfile resourceProfile);
}

- There could be several "*ResourceInfoProvider*" configured on the TM to
retrieve the information for different resources.
- The TM will be responsible to assign those individual resources to each
operator according to their requested amount.
- The operators will be able to get the ResourceInfo from their
RuntimeContext.

If we agree this is a reasonable final state. We can adapt the current FLIP
to it. In fact it does not sound a big change to me. All the proposed
configuration can be as is, it is just that Flink itself won't care about
them, instead a GPUInfoProviver implementing the ResourceInfoProvider will
use them.

Thanks,

Jiangjie (Becket) Qin

On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <se...@apache.org> wrote:

> Hi all!
>
> The main point I wanted to throw into the discussion is the following:
>   - With more and more use cases, more and more tools go into Flink
>   - If everything becomes a "core feature", it will make the project hard
> to develop in the future. Thinking "library" / "plugin" / "extension" style
> where possible helps.
>
>   - A good thought experiment is always: How many future developers have to
> interact with this code (and possibly understand it partially), even if the
> features they touch have nothing to do with GPU support. If many
> contributors to unrelated features will have to touch it and understand it,
> then let's think if there is a different solution. Maybe there is not, but
> then we should be sure why.
>
>   - That led me to raising this issue: If the GPU manager becomes a core
> service in the TaskManager, Environment, RuntimeContext, etc. then everyone
> developing TM and streaming tasks need to understand the GPU manager. That
> seems oddly specific, is my impression.
>
> Access to configuration seems not the right reason to do that. We should
> expose the Flink configuration from the RuntimeContext anyways.
>
> If GPUs are sliced and assigned during scheduling, there may be reason,
> although it looks that it would belong to the slot then. Is that what we
> are doing here?
>
> Best,
> Stephan
>
>
> On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <to...@gmail.com>
> wrote:
>
> >  Thanks for the feedback, Becket.
> >
> > IMO, eventually an operator should only see info of GPUs that are
> dedicated
> > for it, instead of all GPUs on the machine/container in the current
> design.
> > It does not make sense to let the user who writes a UDF to worry about
> > coordination among multiple operators running on the same machine. And if
> > we want to limit the GPU info an operator sees, we should not let the
> > operator to instantiate GPUManager, which means we have to expose
> something
> > through runtime context, either GPU info or some kind of limited access
> to
> > the GPUManager.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <be...@gmail.com> wrote:
> >
> > > It probably make sense for us to first agree on the final state. More
> > > specifically, will the resource info be exposed through runtime context
> > > eventually?
> > >
> > > If that is the final state and we have a seamless migration story from
> > this
> > > FLIP to that final state, Personally I think it is OK to expose the GPU
> > > info in the runtime context.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > >
> > > > @Yangze,
> > > > I think what Stephan means (@Stephan, please correct me if I'm wrong)
> > is
> > > > that, we might not need to hold and maintain the GPUManager as a
> > service
> > > in
> > > > TaskManagerServices or RuntimeContext. An alternative is to create /
> > > > retrieve the GPUManager only in the operators that need it, e.g.,
> with
> > a
> > > > static method `GPUManager.get()`.
> > > >
> > > > @Stephan,
> > > > I agree with you on excluding GPUManager from TaskManagerServices.
> > > >
> > > >    - For the first step, where we provide unified TM-level GPU
> > > information
> > > >    to all operators, it should be fine to have operators access /
> > > >    lazy-initiate GPUManager by themselves.
> > > >    - In future, we might have some more fine-grained GPU management,
> > > where
> > > >    we need to maintain GPUManager as a service and put GPU info in
> slot
> > > >    profiles. But at least for now it's not necessary to introduce
> such
> > > >    complexity.
> > > >
> > > > However, I have some concerns on excluding GPUManager from
> > RuntimeContext
> > > > and let operators access it directly.
> > > >
> > > >    - Configurations needed for creating the GPUManager is not always
> > > >    available for operators.
> > > >    - If later we want to have fine-grained control over GPU (e.g.,
> > > >    operators in each slot can only see GPUs reserved for that slot),
> > the
> > > >    approach cannot be easily extended.
> > > >
> > > > I would suggest to wrap the GPUManager behind RuntimeContext and only
> > > > expose the GPUInfo to users. For now, we can declare a method
> > > > `getGPUInfo()` in RuntimeContext, with a default definition that
> calls
> > > > `GPUManager.get()` to get the lazily-created GPUManager. If later we
> > want
> > > > to create / retrieve GPUManager in a different way, we can simply
> > change
> > > > how `getGPUInfo` is implemented, without needing to change any public
> > > > interfaces.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <ka...@gmail.com>
> > wrote:
> > > >
> > > > > @Shephan
> > > > > Do you mean Minicluster? Yes, it makes sense to share the GPU
> Manager
> > > > > in such scenario.
> > > > > If that's what you worry about, I'm +1 for holding
> > > > > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > > > > TaskManagerServices.
> > > > >
> > > > > Regarding the RuntimeContext/FunctionContext, it just holds the GPU
> > > > > info instead of the GPU Manager. AFAIK, it's the only place we
> could
> > > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <
> isaac@paddlesoft.net
> > >
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org wrote
> > ----
> > > > > >
> > > > > > > > Can we somehow keep this out of the TaskManager services
> > > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > > ExternalServicesManagers in future) is conceptually one of the
> > task
> > > > > > > manager services, just like MemoryManager before 1.10.
> > > > > > > - It maintains/holds the GPU resource at TM level and all of
> the
> > > > > > > operators allocate the GPU resources from it. So, it should be
> > > > > > > exclusive to a single TaskExecutor.
> > > > > > > - We could add a collection called ExternalResourceManagers to
> > hold
> > > > > > > all managers of other external resources in the future.
> > > > > > >
> > > > > >
> > > > > > Can you help me understand why this needs the addition in
> > > > > TaskMagerServices
> > > > > > or in the RuntimeContext?
> > > > > > Are you worried about the case when multiple Task Executors run
> in
> > > the
> > > > > same
> > > > > > JVM? That's not common, but wouldn't it actually be good in that
> > case
> > > > to
> > > > > > share the GPU Manager, given that the GPU is shared?
> > > > > >
> > > > > > Thanks,
> > > > > > Stephan
> > > > > >
> > > > > > ---------------------------
> > > > > >
> > > > > >
> > > > > > > What parts need information about this?
> > > > > > > In this FLIP, operators need the information. Thus, we expose
> GPU
> > > > > > > information to the RuntimeContext/FunctionContext. The slot
> > profile
> > > > is
> > > > > > > not aware of GPU resources as GPU is TM level resource now.
> > > > > > >
> > > > > > > > Can the GPU Manager be a "self contained" thing that simply
> > takes
> > > > the
> > > > > > > configuration, and then abstracts everything internally?
> > > > > > > Yes, we just pass the path/args of the discover script and how
> > many
> > > > > > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > > > > > information and expose them to the
> RuntimeContext/FunctionContext
> > > of
> > > > > > > Operators. Meanwhile, we'd better not allow operators to
> directly
> > > > > > > access GPUManager, it should get what they want from Context.
> We
> > > > could
> > > > > > > then decouple the interface/implementation of GPUManager and
> > Public
> > > > > > > API.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <sewen@apache.org
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > It sounds fine to initially start with GPU specific support
> and
> > > > think
> > > > > > > about
> > > > > > > > generalizing this once we better understand the space.
> > > > > > > >
> > > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > > - Can we somehow keep this out of the TaskManager services?
> > > > Anything
> > > > > we
> > > > > > > > have to pull through all layers of the TM makes the TM
> > components
> > > > yet
> > > > > > > more
> > > > > > > > complex and harder to maintain.
> > > > > > > >
> > > > > > > > - What parts need information about this?
> > > > > > > > -> do the slot profiles need information about the GPU?
> > > > > > > > -> Can the GPU Manager be a "self contained" thing that
> simply
> > > > takes
> > > > > > > > the configuration, and then abstracts everything internally?
> > > > > Operators
> > > > > > > can
> > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <
> karmagyz@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > >
> > > > > > > > > @Becket
> > > > > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add
> them
> > to
> > > > the
> > > > > > > > > Public API section.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > @Stephan @Becket
> > > > > > > > > Regarding the general extended resource mechanism, I second
> > > > > Xintong's
> > > > > > > > > suggestion.
> > > > > > > > > - It's better to leverage ResourceProfile and ResourceSpec
> > > after
> > > > we
> > > > > > > > > supporting fine-grained GPU scheduling. As a first step
> > > > proposal, I
> > > > > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > > > > - Regarding the "Extended Resource Manager", if I
> understand
> > > > > > > > > correctly, it just a code refactoring atm, we could extract
> > the
> > > > > > > > > open/close/allocateExtendResources of GPUManager to that
> > > > > interface. If
> > > > > > > > > that is the case, +1 to do it during implementation.
> > > > > > > > >
> > > > > > > > > @Xingbo
> > > > > > > > > As Xintong said, we looked into how Spark supports a
> general
> > > > > "Custom
> > > > > > > > > Resource Scheduling" before and decided to introduce a
> common
> > > > > resource
> > > > > > > > > configuration
> > > > > > > > >
> > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > to make it more extensible. I think the "resource" is a
> > proper
> > > > > level
> > > > > > > > > to contain all the configs of extended resources.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yangze Guo
> > > > > > > > >
> > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > > hxbks2ks@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > >
> > > > > > > > > > There is no doubt that GPU resource management support
> will
> > > > > greatly
> > > > > > > > > > facilitate the development of AI-related applications by
> > > > PyFlink
> > > > > > > users.
> > > > > > > > > >
> > > > > > > > > > I have only one comment about this wiki:
> > > > > > > > > >
> > > > > > > > > > Regarding the names of several GPU configurations, I
> think
> > it
> > > > is
> > > > > > > better
> > > > > > > > > to
> > > > > > > > > > delete the resource field makes it consistent with the
> > names
> > > of
> > > > > other
> > > > > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > > > > >
> > > > > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > >
> > > > > > > > > > Xingbo
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三
> > 上午10:39写道:
> > > > > > > > > >
> > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > >
> > > > > > > > > > > Actually, Yangze, Yang and I also had an offline
> > discussion
> > > > > about
> > > > > > > > > making
> > > > > > > > > > > the "GPU Support" as some general "Extended Resource
> > > > Support".
> > > > > We
> > > > > > > > > believe
> > > > > > > > > > > supporting extended resources in a general mechanism is
> > > > > definitely
> > > > > > > a
> > > > > > > > > good
> > > > > > > > > > > and extensible way. The reason we propose this FLIP
> > > narrowing
> > > > > its
> > > > > > > scope
> > > > > > > > > > > down to GPU alone, is mainly for the concern on extra
> > > efforts
> > > > > and
> > > > > > > > > review
> > > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > > >
> > > > > > > > > > > To come up with a well design on a general extended
> > > resource
> > > > > > > management
> > > > > > > > > > > mechanism, we would need to investigate more on how
> > people
> > > > use
> > > > > > > > > different
> > > > > > > > > > > kind of resources in practice. For GPU, we learnt such
> > > > > knowledge
> > > > > > > from
> > > > > > > > > the
> > > > > > > > > > > experts, Becket and his team members. But for FPGA, or
> > > other
> > > > > > > potential
> > > > > > > > > > > extended resources, we don't have such convenient
> > > information
> > > > > > > sources,
> > > > > > > > > > > making the investigation requires more efforts, which I
> > > tend
> > > > to
> > > > > > > think
> > > > > > > > > is
> > > > > > > > > > > not necessary atm.
> > > > > > > > > > >
> > > > > > > > > > > On the other hand, we also looked into how Spark
> > supports a
> > > > > general
> > > > > > > > > "Custom
> > > > > > > > > > > Resource Scheduling". Assuming we want to have a
> similar
> > > > > general
> > > > > > > > > extended
> > > > > > > > > > > resource mechanism in the future, we believe that the
> > > current
> > > > > GPU
> > > > > > > > > support
> > > > > > > > > > > design can be easily extended, in an incremental way
> > > without
> > > > > too
> > > > > > > many
> > > > > > > > > > > reworks.
> > > > > > > > > > >
> > > > > > > > > > > - The most important part is probably user interfaces.
> > > Spark
> > > > > > > offers
> > > > > > > > > > > configuration options to define the amount, discovery
> > > script
> > > > > and
> > > > > > > > > vendor
> > > > > > > > > > > (on
> > > > > > > > > > > k8s) in a per resource type bias [1], which is very
> > similar
> > > > to
> > > > > > > what
> > > > > > > > > we
> > > > > > > > > > > proposed in this FLIP. I think it's not necessary to
> > expose
> > > > > > > config
> > > > > > > > > > > options
> > > > > > > > > > > in the general way atm, since we do not have supports
> for
> > > > other
> > > > > > > > > resource
> > > > > > > > > > > types now. If later we decided to have per resource
> type
> > > > config
> > > > > > > > > > > options, we
> > > > > > > > > > > can have backwards compatibility on the current
> proposed
> > > > > options
> > > > > > > > > with
> > > > > > > > > > > simple key mapping.
> > > > > > > > > > > - For the GPU Manager, if later needed we can change it
> > to
> > > a
> > > > > > > > > "Extended
> > > > > > > > > > > Resource Manager" (or whatever it is called). That
> should
> > > be
> > > > a
> > > > > > > pure
> > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > - For ResourceProfile and ResourceSpec, there are
> already
> > > > > > > fields for
> > > > > > > > > > > general extended resource. We can of course leverage
> them
> > > > when
> > > > > > > > > > > supporting
> > > > > > > > > > > fine grained GPU scheduling. That is also not in the
> > scope
> > > of
> > > > > > > this
> > > > > > > > > first
> > > > > > > > > > > step proposal, and would require FLIP-56 to be finished
> > > > first.
> > > > > > > > > > >
> > > > > > > > > > > To summary up, I agree with Becket that have a separate
> > > FLIP
> > > > > for
> > > > > > > the
> > > > > > > > > > > general extended resource mechanism, and keep it in
> mind
> > > when
> > > > > > > > > discussing
> > > > > > > > > > > and implementing the current one.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > > becket.qin@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > That's a good point, Stephan. It makes total sense to
> > > > > generalize
> > > > > > > the
> > > > > > > > > > > > resource management to support custom resources.
> Having
> > > > that
> > > > > > > allows
> > > > > > > > > users
> > > > > > > > > > > > to add new resources by themselves. The general
> > resource
> > > > > > > management
> > > > > > > > > may
> > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. The custom resource type definition. It is
> supported
> > > by
> > > > > the
> > > > > > > > > extended
> > > > > > > > > > > > resources in ResourceProfile and ResourceSpec. This
> > will
> > > > > likely
> > > > > > > cover
> > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > >
> > > > > > > > > > > > 2. The custom resource allocation logic, i.e. how to
> > > assign
> > > > > the
> > > > > > > > > resources
> > > > > > > > > > > > to different tasks, operators, and so on. This may
> > > require
> > > > > two
> > > > > > > > > levels /
> > > > > > > > > > > > steps:
> > > > > > > > > > > > a. Subtask level - make sure the subtasks are put
> into
> > > > > > > suitable
> > > > > > > > > > > slots.
> > > > > > > > > > > > It is done by the global RM and is not customizable
> > right
> > > > > now.
> > > > > > > > > > > > b. Operator level - map the exact resource to the
> > > operators
> > > > > > > in
> > > > > > > > > TM.
> > > > > > > > > > > e.g.
> > > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step
> > is
> > > > > needed
> > > > > > > > > assuming
> > > > > > > > > > > > the global RM does not distinguish individual
> resources
> > > of
> > > > > the
> > > > > > > same
> > > > > > > > > type.
> > > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > > >
> > > > > > > > > > > > The GPU manager is designed to do 2.b here. So it
> > should
> > > > > > > discover the
> > > > > > > > > > > > physical GPU information and bind/match them to each
> > > > > operators.
> > > > > > > > > Making
> > > > > > > > > > > this
> > > > > > > > > > > > general will fill in the missing piece to support
> > custom
> > > > > resource
> > > > > > > > > type
> > > > > > > > > > > > definition. But I'd avoid calling it a "External
> > Resource
> > > > > > > Manager" to
> > > > > > > > > > > avoid
> > > > > > > > > > > > confusion with RM, maybe something like "Operator
> > > Resource
> > > > > > > Assigner"
> > > > > > > > > > > would
> > > > > > > > > > > > be more accurate. So for each resource type users can
> > > have
> > > > an
> > > > > > > > > optional
> > > > > > > > > > > > "Operator Resource Assigner" in the TM. For memory,
> > users
> > > > > don't
> > > > > > > need
> > > > > > > > > > > this,
> > > > > > > > > > > > but for other extended resources, users may need
> that.
> > > > > > > > > > > >
> > > > > > > > > > > > Personally I think a pluggable "Operator Resource
> > > Assigner"
> > > > > is
> > > > > > > > > achievable
> > > > > > > > > > > > in this FLIP. But I am also OK with having that in a
> > > > separate
> > > > > > > FLIP
> > > > > > > > > > > because
> > > > > > > > > > > > the interface between the "Operator Resource
> Assigner"
> > > and
> > > > > > > operator
> > > > > > > > > may
> > > > > > > > > > > > take a while to settle down if we want to make it
> > > generic.
> > > > > But I
> > > > > > > > > think
> > > > > > > > > > > our
> > > > > > > > > > > > implementation should take this future work into
> > > > > consideration so
> > > > > > > > > that we
> > > > > > > > > > > > don't need to break backwards compatibility once we
> > have
> > > > > that.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > > > > sewen@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I cannot really give much input into the mechanics
> of
> > > > > GPU-aware
> > > > > > > > > > > > scheduling
> > > > > > > > > > > > > and GPU allocation, as I have no experience with
> > that.
> > > > > > > > > > > > >
> > > > > > > > > > > > > One thought I had when reading the proposal is if
> it
> > > > makes
> > > > > > > sense to
> > > > > > > > > > > look
> > > > > > > > > > > > at
> > > > > > > > > > > > > the "GPU Manager" as an "External Resource
> Manager",
> > > and
> > > > > GPU
> > > > > > > is one
> > > > > > > > > > > such
> > > > > > > > > > > > > resource.
> > > > > > > > > > > > > The way I understand the ResourceProfile and
> > > > ResourceSpec,
> > > > > > > that is
> > > > > > > > > how
> > > > > > > > > > > it
> > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > It has the advantage that it looks more extensible.
> > > Maybe
> > > > > > > there is
> > > > > > > > > a
> > > > > > > > > > > GPU
> > > > > > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and
> FPGA
> > > > > > > Resource, a
> > > > > > > > > > > Alibaba
> > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Stephan
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > > > > becket.qin@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource
> management
> > > > > support
> > > > > > > is a
> > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > for machine learning use cases. Actually it is
> one
> > of
> > > > the
> > > > > > > mostly
> > > > > > > > > > > asked
> > > > > > > > > > > > > > question from the users who are interested in
> using
> > > > Flink
> > > > > > > for ML.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > > > > 1. The WebUI / REST API should probably also be
> > > > > mentioned in
> > > > > > > the
> > > > > > > > > > > public
> > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > 2. Is the data structure that holds GPU info
> also a
> > > > > public
> > > > > > > API?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for drafting the FLIP and kicking off
> the
> > > > > > > discussion,
> > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Big +1 for this feature. Supporting using of
> GPU
> > in
> > > > > Flink
> > > > > > > is
> > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks
> good
> > > to
> > > > > me. I
> > > > > > > > > think
> > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > > > > > karmagyz@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We would like to start a discussion thread on
> > > > > "FLIP-108:
> > > > > > > Add
> > > > > > > > > GPU
> > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This FLIP mainly discusses the following
> > issues:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Enable user to configure how many GPUs in a
> > > task
> > > > > > > executor
> > > > > > > > > and
> > > > > > > > > > > > > > > > forward such requirements to the external
> > > resource
> > > > > > > managers
> > > > > > > > > (for
> > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > > - Provide information of available GPU
> > resources
> > > to
> > > > > > > > > operators.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Key changes proposed in the FLIP are as
> > follows:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Forward GPU resource requirements to
> > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > - Introduce GPUManager as one of the task
> > manager
> > > > > > > services to
> > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > and expose GPU resource information to the
> > > context
> > > > of
> > > > > > > > > functions.
> > > > > > > > > > > > > > > > - Introduce the default script for GPU
> > discovery,
> > > > in
> > > > > > > which we
> > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > the privilege mode to help user to achieve
> > > > > worker-level
> > > > > > > > > isolation
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Please find more details in the FLIP wiki
> > > document
> > > > > [1].
> > > > > > > > > Looking
> > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Stephan Ewen <se...@apache.org>.
Hi all!

The main point I wanted to throw into the discussion is the following:
  - With more and more use cases, more and more tools go into Flink
  - If everything becomes a "core feature", it will make the project hard
to develop in the future. Thinking "library" / "plugin" / "extension" style
where possible helps.

  - A good thought experiment is always: How many future developers have to
interact with this code (and possibly understand it partially), even if the
features they touch have nothing to do with GPU support. If many
contributors to unrelated features will have to touch it and understand it,
then let's think if there is a different solution. Maybe there is not, but
then we should be sure why.

  - That led me to raising this issue: If the GPU manager becomes a core
service in the TaskManager, Environment, RuntimeContext, etc. then everyone
developing TM and streaming tasks need to understand the GPU manager. That
seems oddly specific, is my impression.

Access to configuration seems not the right reason to do that. We should
expose the Flink configuration from the RuntimeContext anyways.

If GPUs are sliced and assigned during scheduling, there may be reason,
although it looks that it would belong to the slot then. Is that what we
are doing here?

Best,
Stephan


On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <to...@gmail.com> wrote:

>  Thanks for the feedback, Becket.
>
> IMO, eventually an operator should only see info of GPUs that are dedicated
> for it, instead of all GPUs on the machine/container in the current design.
> It does not make sense to let the user who writes a UDF to worry about
> coordination among multiple operators running on the same machine. And if
> we want to limit the GPU info an operator sees, we should not let the
> operator to instantiate GPUManager, which means we have to expose something
> through runtime context, either GPU info or some kind of limited access to
> the GPUManager.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <be...@gmail.com> wrote:
>
> > It probably make sense for us to first agree on the final state. More
> > specifically, will the resource info be exposed through runtime context
> > eventually?
> >
> > If that is the final state and we have a seamless migration story from
> this
> > FLIP to that final state, Personally I think it is OK to expose the GPU
> > info in the runtime context.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <to...@gmail.com>
> > wrote:
> >
> > > @Yangze,
> > > I think what Stephan means (@Stephan, please correct me if I'm wrong)
> is
> > > that, we might not need to hold and maintain the GPUManager as a
> service
> > in
> > > TaskManagerServices or RuntimeContext. An alternative is to create /
> > > retrieve the GPUManager only in the operators that need it, e.g., with
> a
> > > static method `GPUManager.get()`.
> > >
> > > @Stephan,
> > > I agree with you on excluding GPUManager from TaskManagerServices.
> > >
> > >    - For the first step, where we provide unified TM-level GPU
> > information
> > >    to all operators, it should be fine to have operators access /
> > >    lazy-initiate GPUManager by themselves.
> > >    - In future, we might have some more fine-grained GPU management,
> > where
> > >    we need to maintain GPUManager as a service and put GPU info in slot
> > >    profiles. But at least for now it's not necessary to introduce such
> > >    complexity.
> > >
> > > However, I have some concerns on excluding GPUManager from
> RuntimeContext
> > > and let operators access it directly.
> > >
> > >    - Configurations needed for creating the GPUManager is not always
> > >    available for operators.
> > >    - If later we want to have fine-grained control over GPU (e.g.,
> > >    operators in each slot can only see GPUs reserved for that slot),
> the
> > >    approach cannot be easily extended.
> > >
> > > I would suggest to wrap the GPUManager behind RuntimeContext and only
> > > expose the GPUInfo to users. For now, we can declare a method
> > > `getGPUInfo()` in RuntimeContext, with a default definition that calls
> > > `GPUManager.get()` to get the lazily-created GPUManager. If later we
> want
> > > to create / retrieve GPUManager in a different way, we can simply
> change
> > > how `getGPUInfo` is implemented, without needing to change any public
> > > interfaces.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <ka...@gmail.com>
> wrote:
> > >
> > > > @Shephan
> > > > Do you mean Minicluster? Yes, it makes sense to share the GPU Manager
> > > > in such scenario.
> > > > If that's what you worry about, I'm +1 for holding
> > > > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > > > TaskManagerServices.
> > > >
> > > > Regarding the RuntimeContext/FunctionContext, it just holds the GPU
> > > > info instead of the GPU Manager. AFAIK, it's the only place we could
> > > > pass GPU info to the RichFunction/UserDefinedFunction.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <isaac@paddlesoft.net
> >
> > > > wrote:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org wrote
> ----
> > > > >
> > > > > > > Can we somehow keep this out of the TaskManager services
> > > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > > ExternalServicesManagers in future) is conceptually one of the
> task
> > > > > > manager services, just like MemoryManager before 1.10.
> > > > > > - It maintains/holds the GPU resource at TM level and all of the
> > > > > > operators allocate the GPU resources from it. So, it should be
> > > > > > exclusive to a single TaskExecutor.
> > > > > > - We could add a collection called ExternalResourceManagers to
> hold
> > > > > > all managers of other external resources in the future.
> > > > > >
> > > > >
> > > > > Can you help me understand why this needs the addition in
> > > > TaskMagerServices
> > > > > or in the RuntimeContext?
> > > > > Are you worried about the case when multiple Task Executors run in
> > the
> > > > same
> > > > > JVM? That's not common, but wouldn't it actually be good in that
> case
> > > to
> > > > > share the GPU Manager, given that the GPU is shared?
> > > > >
> > > > > Thanks,
> > > > > Stephan
> > > > >
> > > > > ---------------------------
> > > > >
> > > > >
> > > > > > What parts need information about this?
> > > > > > In this FLIP, operators need the information. Thus, we expose GPU
> > > > > > information to the RuntimeContext/FunctionContext. The slot
> profile
> > > is
> > > > > > not aware of GPU resources as GPU is TM level resource now.
> > > > > >
> > > > > > > Can the GPU Manager be a "self contained" thing that simply
> takes
> > > the
> > > > > > configuration, and then abstracts everything internally?
> > > > > > Yes, we just pass the path/args of the discover script and how
> many
> > > > > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > > > > information and expose them to the RuntimeContext/FunctionContext
> > of
> > > > > > Operators. Meanwhile, we'd better not allow operators to directly
> > > > > > access GPUManager, it should get what they want from Context. We
> > > could
> > > > > > then decouple the interface/implementation of GPUManager and
> Public
> > > > > > API.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > It sounds fine to initially start with GPU specific support and
> > > think
> > > > > > about
> > > > > > > generalizing this once we better understand the space.
> > > > > > >
> > > > > > > About the implementation suggested in FLIP-108:
> > > > > > > - Can we somehow keep this out of the TaskManager services?
> > > Anything
> > > > we
> > > > > > > have to pull through all layers of the TM makes the TM
> components
> > > yet
> > > > > > more
> > > > > > > complex and harder to maintain.
> > > > > > >
> > > > > > > - What parts need information about this?
> > > > > > > -> do the slot profiles need information about the GPU?
> > > > > > > -> Can the GPU Manager be a "self contained" thing that simply
> > > takes
> > > > > > > the configuration, and then abstracts everything internally?
> > > > Operators
> > > > > > can
> > > > > > > access it via "GPUManager.get()" or so?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <ka...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Thanks for all the feedbacks.
> > > > > > > >
> > > > > > > > @Becket
> > > > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add them
> to
> > > the
> > > > > > > > Public API section.
> > > > > > > >
> > > > > > > >
> > > > > > > > @Stephan @Becket
> > > > > > > > Regarding the general extended resource mechanism, I second
> > > > Xintong's
> > > > > > > > suggestion.
> > > > > > > > - It's better to leverage ResourceProfile and ResourceSpec
> > after
> > > we
> > > > > > > > supporting fine-grained GPU scheduling. As a first step
> > > proposal, I
> > > > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > > > - Regarding the "Extended Resource Manager", if I understand
> > > > > > > > correctly, it just a code refactoring atm, we could extract
> the
> > > > > > > > open/close/allocateExtendResources of GPUManager to that
> > > > interface. If
> > > > > > > > that is the case, +1 to do it during implementation.
> > > > > > > >
> > > > > > > > @Xingbo
> > > > > > > > As Xintong said, we looked into how Spark supports a general
> > > > "Custom
> > > > > > > > Resource Scheduling" before and decided to introduce a common
> > > > resource
> > > > > > > > configuration
> > > > > > > >
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > to make it more extensible. I think the "resource" is a
> proper
> > > > level
> > > > > > > > to contain all the configs of extended resources.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> > hxbks2ks@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > >
> > > > > > > > > There is no doubt that GPU resource management support will
> > > > greatly
> > > > > > > > > facilitate the development of AI-related applications by
> > > PyFlink
> > > > > > users.
> > > > > > > > >
> > > > > > > > > I have only one comment about this wiki:
> > > > > > > > >
> > > > > > > > > Regarding the names of several GPU configurations, I think
> it
> > > is
> > > > > > better
> > > > > > > > to
> > > > > > > > > delete the resource field makes it consistent with the
> names
> > of
> > > > other
> > > > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > > > >
> > > > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > >
> > > > > > > > > Xingbo
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三
> 上午10:39写道:
> > > > > > > > >
> > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > >
> > > > > > > > > > Actually, Yangze, Yang and I also had an offline
> discussion
> > > > about
> > > > > > > > making
> > > > > > > > > > the "GPU Support" as some general "Extended Resource
> > > Support".
> > > > We
> > > > > > > > believe
> > > > > > > > > > supporting extended resources in a general mechanism is
> > > > definitely
> > > > > > a
> > > > > > > > good
> > > > > > > > > > and extensible way. The reason we propose this FLIP
> > narrowing
> > > > its
> > > > > > scope
> > > > > > > > > > down to GPU alone, is mainly for the concern on extra
> > efforts
> > > > and
> > > > > > > > review
> > > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > > >
> > > > > > > > > > To come up with a well design on a general extended
> > resource
> > > > > > management
> > > > > > > > > > mechanism, we would need to investigate more on how
> people
> > > use
> > > > > > > > different
> > > > > > > > > > kind of resources in practice. For GPU, we learnt such
> > > > knowledge
> > > > > > from
> > > > > > > > the
> > > > > > > > > > experts, Becket and his team members. But for FPGA, or
> > other
> > > > > > potential
> > > > > > > > > > extended resources, we don't have such convenient
> > information
> > > > > > sources,
> > > > > > > > > > making the investigation requires more efforts, which I
> > tend
> > > to
> > > > > > think
> > > > > > > > is
> > > > > > > > > > not necessary atm.
> > > > > > > > > >
> > > > > > > > > > On the other hand, we also looked into how Spark
> supports a
> > > > general
> > > > > > > > "Custom
> > > > > > > > > > Resource Scheduling". Assuming we want to have a similar
> > > > general
> > > > > > > > extended
> > > > > > > > > > resource mechanism in the future, we believe that the
> > current
> > > > GPU
> > > > > > > > support
> > > > > > > > > > design can be easily extended, in an incremental way
> > without
> > > > too
> > > > > > many
> > > > > > > > > > reworks.
> > > > > > > > > >
> > > > > > > > > > - The most important part is probably user interfaces.
> > Spark
> > > > > > offers
> > > > > > > > > > configuration options to define the amount, discovery
> > script
> > > > and
> > > > > > > > vendor
> > > > > > > > > > (on
> > > > > > > > > > k8s) in a per resource type bias [1], which is very
> similar
> > > to
> > > > > > what
> > > > > > > > we
> > > > > > > > > > proposed in this FLIP. I think it's not necessary to
> expose
> > > > > > config
> > > > > > > > > > options
> > > > > > > > > > in the general way atm, since we do not have supports for
> > > other
> > > > > > > > resource
> > > > > > > > > > types now. If later we decided to have per resource type
> > > config
> > > > > > > > > > options, we
> > > > > > > > > > can have backwards compatibility on the current proposed
> > > > options
> > > > > > > > with
> > > > > > > > > > simple key mapping.
> > > > > > > > > > - For the GPU Manager, if later needed we can change it
> to
> > a
> > > > > > > > "Extended
> > > > > > > > > > Resource Manager" (or whatever it is called). That should
> > be
> > > a
> > > > > > pure
> > > > > > > > > > component-internal refactoring.
> > > > > > > > > > - For ResourceProfile and ResourceSpec, there are already
> > > > > > fields for
> > > > > > > > > > general extended resource. We can of course leverage them
> > > when
> > > > > > > > > > supporting
> > > > > > > > > > fine grained GPU scheduling. That is also not in the
> scope
> > of
> > > > > > this
> > > > > > > > first
> > > > > > > > > > step proposal, and would require FLIP-56 to be finished
> > > first.
> > > > > > > > > >
> > > > > > > > > > To summary up, I agree with Becket that have a separate
> > FLIP
> > > > for
> > > > > > the
> > > > > > > > > > general extended resource mechanism, and keep it in mind
> > when
> > > > > > > > discussing
> > > > > > > > > > and implementing the current one.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > > becket.qin@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > That's a good point, Stephan. It makes total sense to
> > > > generalize
> > > > > > the
> > > > > > > > > > > resource management to support custom resources. Having
> > > that
> > > > > > allows
> > > > > > > > users
> > > > > > > > > > > to add new resources by themselves. The general
> resource
> > > > > > management
> > > > > > > > may
> > > > > > > > > > > involve two different aspects:
> > > > > > > > > > >
> > > > > > > > > > > 1. The custom resource type definition. It is supported
> > by
> > > > the
> > > > > > > > extended
> > > > > > > > > > > resources in ResourceProfile and ResourceSpec. This
> will
> > > > likely
> > > > > > cover
> > > > > > > > > > > majority of the cases.
> > > > > > > > > > >
> > > > > > > > > > > 2. The custom resource allocation logic, i.e. how to
> > assign
> > > > the
> > > > > > > > resources
> > > > > > > > > > > to different tasks, operators, and so on. This may
> > require
> > > > two
> > > > > > > > levels /
> > > > > > > > > > > steps:
> > > > > > > > > > > a. Subtask level - make sure the subtasks are put into
> > > > > > suitable
> > > > > > > > > > slots.
> > > > > > > > > > > It is done by the global RM and is not customizable
> right
> > > > now.
> > > > > > > > > > > b. Operator level - map the exact resource to the
> > operators
> > > > > > in
> > > > > > > > TM.
> > > > > > > > > > e.g.
> > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step
> is
> > > > needed
> > > > > > > > assuming
> > > > > > > > > > > the global RM does not distinguish individual resources
> > of
> > > > the
> > > > > > same
> > > > > > > > type.
> > > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > > >
> > > > > > > > > > > The GPU manager is designed to do 2.b here. So it
> should
> > > > > > discover the
> > > > > > > > > > > physical GPU information and bind/match them to each
> > > > operators.
> > > > > > > > Making
> > > > > > > > > > this
> > > > > > > > > > > general will fill in the missing piece to support
> custom
> > > > resource
> > > > > > > > type
> > > > > > > > > > > definition. But I'd avoid calling it a "External
> Resource
> > > > > > Manager" to
> > > > > > > > > > avoid
> > > > > > > > > > > confusion with RM, maybe something like "Operator
> > Resource
> > > > > > Assigner"
> > > > > > > > > > would
> > > > > > > > > > > be more accurate. So for each resource type users can
> > have
> > > an
> > > > > > > > optional
> > > > > > > > > > > "Operator Resource Assigner" in the TM. For memory,
> users
> > > > don't
> > > > > > need
> > > > > > > > > > this,
> > > > > > > > > > > but for other extended resources, users may need that.
> > > > > > > > > > >
> > > > > > > > > > > Personally I think a pluggable "Operator Resource
> > Assigner"
> > > > is
> > > > > > > > achievable
> > > > > > > > > > > in this FLIP. But I am also OK with having that in a
> > > separate
> > > > > > FLIP
> > > > > > > > > > because
> > > > > > > > > > > the interface between the "Operator Resource Assigner"
> > and
> > > > > > operator
> > > > > > > > may
> > > > > > > > > > > take a while to settle down if we want to make it
> > generic.
> > > > But I
> > > > > > > > think
> > > > > > > > > > our
> > > > > > > > > > > implementation should take this future work into
> > > > consideration so
> > > > > > > > that we
> > > > > > > > > > > don't need to break backwards compatibility once we
> have
> > > > that.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > > > sewen@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > > >
> > > > > > > > > > > > I cannot really give much input into the mechanics of
> > > > GPU-aware
> > > > > > > > > > > scheduling
> > > > > > > > > > > > and GPU allocation, as I have no experience with
> that.
> > > > > > > > > > > >
> > > > > > > > > > > > One thought I had when reading the proposal is if it
> > > makes
> > > > > > sense to
> > > > > > > > > > look
> > > > > > > > > > > at
> > > > > > > > > > > > the "GPU Manager" as an "External Resource Manager",
> > and
> > > > GPU
> > > > > > is one
> > > > > > > > > > such
> > > > > > > > > > > > resource.
> > > > > > > > > > > > The way I understand the ResourceProfile and
> > > ResourceSpec,
> > > > > > that is
> > > > > > > > how
> > > > > > > > > > it
> > > > > > > > > > > > is done there.
> > > > > > > > > > > > It has the advantage that it looks more extensible.
> > Maybe
> > > > > > there is
> > > > > > > > a
> > > > > > > > > > GPU
> > > > > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> > > > > > Resource, a
> > > > > > > > > > Alibaba
> > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Stephan
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > > > becket.qin@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource management
> > > > support
> > > > > > is a
> > > > > > > > > > > > must-have
> > > > > > > > > > > > > for machine learning use cases. Actually it is one
> of
> > > the
> > > > > > mostly
> > > > > > > > > > asked
> > > > > > > > > > > > > question from the users who are interested in using
> > > Flink
> > > > > > for ML.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > > > 1. The WebUI / REST API should probably also be
> > > > mentioned in
> > > > > > the
> > > > > > > > > > public
> > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > 2. Is the data structure that holds GPU info also a
> > > > public
> > > > > > API?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for drafting the FLIP and kicking off the
> > > > > > discussion,
> > > > > > > > > > Yangze.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Big +1 for this feature. Supporting using of GPU
> in
> > > > Flink
> > > > > > is
> > > > > > > > > > > > significant,
> > > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good
> > to
> > > > me. I
> > > > > > > > think
> > > > > > > > > > > it's a
> > > > > > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > > > > karmagyz@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We would like to start a discussion thread on
> > > > "FLIP-108:
> > > > > > Add
> > > > > > > > GPU
> > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This FLIP mainly discusses the following
> issues:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Enable user to configure how many GPUs in a
> > task
> > > > > > executor
> > > > > > > > and
> > > > > > > > > > > > > > > forward such requirements to the external
> > resource
> > > > > > managers
> > > > > > > > (for
> > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > > - Provide information of available GPU
> resources
> > to
> > > > > > > > operators.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Key changes proposed in the FLIP are as
> follows:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Forward GPU resource requirements to
> > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > - Introduce GPUManager as one of the task
> manager
> > > > > > services to
> > > > > > > > > > > > discover
> > > > > > > > > > > > > > > and expose GPU resource information to the
> > context
> > > of
> > > > > > > > functions.
> > > > > > > > > > > > > > > - Introduce the default script for GPU
> discovery,
> > > in
> > > > > > which we
> > > > > > > > > > > provide
> > > > > > > > > > > > > > > the privilege mode to help user to achieve
> > > > worker-level
> > > > > > > > isolation
> > > > > > > > > > > in
> > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please find more details in the FLIP wiki
> > document
> > > > [1].
> > > > > > > > Looking
> > > > > > > > > > > > forward
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Xintong Song <to...@gmail.com>.
 Thanks for the feedback, Becket.

IMO, eventually an operator should only see info of GPUs that are dedicated
for it, instead of all GPUs on the machine/container in the current design.
It does not make sense to let the user who writes a UDF to worry about
coordination among multiple operators running on the same machine. And if
we want to limit the GPU info an operator sees, we should not let the
operator to instantiate GPUManager, which means we have to expose something
through runtime context, either GPU info or some kind of limited access to
the GPUManager.

Thank you~

Xintong Song



On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <be...@gmail.com> wrote:

> It probably make sense for us to first agree on the final state. More
> specifically, will the resource info be exposed through runtime context
> eventually?
>
> If that is the final state and we have a seamless migration story from this
> FLIP to that final state, Personally I think it is OK to expose the GPU
> info in the runtime context.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <to...@gmail.com>
> wrote:
>
> > @Yangze,
> > I think what Stephan means (@Stephan, please correct me if I'm wrong) is
> > that, we might not need to hold and maintain the GPUManager as a service
> in
> > TaskManagerServices or RuntimeContext. An alternative is to create /
> > retrieve the GPUManager only in the operators that need it, e.g., with a
> > static method `GPUManager.get()`.
> >
> > @Stephan,
> > I agree with you on excluding GPUManager from TaskManagerServices.
> >
> >    - For the first step, where we provide unified TM-level GPU
> information
> >    to all operators, it should be fine to have operators access /
> >    lazy-initiate GPUManager by themselves.
> >    - In future, we might have some more fine-grained GPU management,
> where
> >    we need to maintain GPUManager as a service and put GPU info in slot
> >    profiles. But at least for now it's not necessary to introduce such
> >    complexity.
> >
> > However, I have some concerns on excluding GPUManager from RuntimeContext
> > and let operators access it directly.
> >
> >    - Configurations needed for creating the GPUManager is not always
> >    available for operators.
> >    - If later we want to have fine-grained control over GPU (e.g.,
> >    operators in each slot can only see GPUs reserved for that slot), the
> >    approach cannot be easily extended.
> >
> > I would suggest to wrap the GPUManager behind RuntimeContext and only
> > expose the GPUInfo to users. For now, we can declare a method
> > `getGPUInfo()` in RuntimeContext, with a default definition that calls
> > `GPUManager.get()` to get the lazily-created GPUManager. If later we want
> > to create / retrieve GPUManager in a different way, we can simply change
> > how `getGPUInfo` is implemented, without needing to change any public
> > interfaces.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > @Shephan
> > > Do you mean Minicluster? Yes, it makes sense to share the GPU Manager
> > > in such scenario.
> > > If that's what you worry about, I'm +1 for holding
> > > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > > TaskManagerServices.
> > >
> > > Regarding the RuntimeContext/FunctionContext, it just holds the GPU
> > > info instead of the GPU Manager. AFAIK, it's the only place we could
> > > pass GPU info to the RichFunction/UserDefinedFunction.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <is...@paddlesoft.net>
> > > wrote:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org wrote ----
> > > >
> > > > > > Can we somehow keep this out of the TaskManager services
> > > > > I fear that we could not. IMO, the GPUManager(or
> > > > > ExternalServicesManagers in future) is conceptually one of the task
> > > > > manager services, just like MemoryManager before 1.10.
> > > > > - It maintains/holds the GPU resource at TM level and all of the
> > > > > operators allocate the GPU resources from it. So, it should be
> > > > > exclusive to a single TaskExecutor.
> > > > > - We could add a collection called ExternalResourceManagers to hold
> > > > > all managers of other external resources in the future.
> > > > >
> > > >
> > > > Can you help me understand why this needs the addition in
> > > TaskMagerServices
> > > > or in the RuntimeContext?
> > > > Are you worried about the case when multiple Task Executors run in
> the
> > > same
> > > > JVM? That's not common, but wouldn't it actually be good in that case
> > to
> > > > share the GPU Manager, given that the GPU is shared?
> > > >
> > > > Thanks,
> > > > Stephan
> > > >
> > > > ---------------------------
> > > >
> > > >
> > > > > What parts need information about this?
> > > > > In this FLIP, operators need the information. Thus, we expose GPU
> > > > > information to the RuntimeContext/FunctionContext. The slot profile
> > is
> > > > > not aware of GPU resources as GPU is TM level resource now.
> > > > >
> > > > > > Can the GPU Manager be a "self contained" thing that simply takes
> > the
> > > > > configuration, and then abstracts everything internally?
> > > > > Yes, we just pass the path/args of the discover script and how many
> > > > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > > > information and expose them to the RuntimeContext/FunctionContext
> of
> > > > > Operators. Meanwhile, we'd better not allow operators to directly
> > > > > access GPUManager, it should get what they want from Context. We
> > could
> > > > > then decouple the interface/implementation of GPUManager and Public
> > > > > API.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <se...@apache.org>
> > wrote:
> > > > > >
> > > > > > It sounds fine to initially start with GPU specific support and
> > think
> > > > > about
> > > > > > generalizing this once we better understand the space.
> > > > > >
> > > > > > About the implementation suggested in FLIP-108:
> > > > > > - Can we somehow keep this out of the TaskManager services?
> > Anything
> > > we
> > > > > > have to pull through all layers of the TM makes the TM components
> > yet
> > > > > more
> > > > > > complex and harder to maintain.
> > > > > >
> > > > > > - What parts need information about this?
> > > > > > -> do the slot profiles need information about the GPU?
> > > > > > -> Can the GPU Manager be a "self contained" thing that simply
> > takes
> > > > > > the configuration, and then abstracts everything internally?
> > > Operators
> > > > > can
> > > > > > access it via "GPUManager.get()" or so?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <ka...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Thanks for all the feedbacks.
> > > > > > >
> > > > > > > @Becket
> > > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add them to
> > the
> > > > > > > Public API section.
> > > > > > >
> > > > > > >
> > > > > > > @Stephan @Becket
> > > > > > > Regarding the general extended resource mechanism, I second
> > > Xintong's
> > > > > > > suggestion.
> > > > > > > - It's better to leverage ResourceProfile and ResourceSpec
> after
> > we
> > > > > > > supporting fine-grained GPU scheduling. As a first step
> > proposal, I
> > > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > > - Regarding the "Extended Resource Manager", if I understand
> > > > > > > correctly, it just a code refactoring atm, we could extract the
> > > > > > > open/close/allocateExtendResources of GPUManager to that
> > > interface. If
> > > > > > > that is the case, +1 to do it during implementation.
> > > > > > >
> > > > > > > @Xingbo
> > > > > > > As Xintong said, we looked into how Spark supports a general
> > > "Custom
> > > > > > > Resource Scheduling" before and decided to introduce a common
> > > resource
> > > > > > > configuration
> > > > > > >
> > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > to make it more extensible. I think the "resource" is a proper
> > > level
> > > > > > > to contain all the configs of extended resources.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <
> hxbks2ks@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > >
> > > > > > > > There is no doubt that GPU resource management support will
> > > greatly
> > > > > > > > facilitate the development of AI-related applications by
> > PyFlink
> > > > > users.
> > > > > > > >
> > > > > > > > I have only one comment about this wiki:
> > > > > > > >
> > > > > > > > Regarding the names of several GPU configurations, I think it
> > is
> > > > > better
> > > > > > > to
> > > > > > > > delete the resource field makes it consistent with the names
> of
> > > other
> > > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > > >
> > > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Xingbo
> > > > > > > >
> > > > > > > >
> > > > > > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三 上午10:39写道:
> > > > > > > >
> > > > > > > > > @Stephan, @Becket,
> > > > > > > > >
> > > > > > > > > Actually, Yangze, Yang and I also had an offline discussion
> > > about
> > > > > > > making
> > > > > > > > > the "GPU Support" as some general "Extended Resource
> > Support".
> > > We
> > > > > > > believe
> > > > > > > > > supporting extended resources in a general mechanism is
> > > definitely
> > > > > a
> > > > > > > good
> > > > > > > > > and extensible way. The reason we propose this FLIP
> narrowing
> > > its
> > > > > scope
> > > > > > > > > down to GPU alone, is mainly for the concern on extra
> efforts
> > > and
> > > > > > > review
> > > > > > > > > capacity needed for a general mechanism.
> > > > > > > > >
> > > > > > > > > To come up with a well design on a general extended
> resource
> > > > > management
> > > > > > > > > mechanism, we would need to investigate more on how people
> > use
> > > > > > > different
> > > > > > > > > kind of resources in practice. For GPU, we learnt such
> > > knowledge
> > > > > from
> > > > > > > the
> > > > > > > > > experts, Becket and his team members. But for FPGA, or
> other
> > > > > potential
> > > > > > > > > extended resources, we don't have such convenient
> information
> > > > > sources,
> > > > > > > > > making the investigation requires more efforts, which I
> tend
> > to
> > > > > think
> > > > > > > is
> > > > > > > > > not necessary atm.
> > > > > > > > >
> > > > > > > > > On the other hand, we also looked into how Spark supports a
> > > general
> > > > > > > "Custom
> > > > > > > > > Resource Scheduling". Assuming we want to have a similar
> > > general
> > > > > > > extended
> > > > > > > > > resource mechanism in the future, we believe that the
> current
> > > GPU
> > > > > > > support
> > > > > > > > > design can be easily extended, in an incremental way
> without
> > > too
> > > > > many
> > > > > > > > > reworks.
> > > > > > > > >
> > > > > > > > > - The most important part is probably user interfaces.
> Spark
> > > > > offers
> > > > > > > > > configuration options to define the amount, discovery
> script
> > > and
> > > > > > > vendor
> > > > > > > > > (on
> > > > > > > > > k8s) in a per resource type bias [1], which is very similar
> > to
> > > > > what
> > > > > > > we
> > > > > > > > > proposed in this FLIP. I think it's not necessary to expose
> > > > > config
> > > > > > > > > options
> > > > > > > > > in the general way atm, since we do not have supports for
> > other
> > > > > > > resource
> > > > > > > > > types now. If later we decided to have per resource type
> > config
> > > > > > > > > options, we
> > > > > > > > > can have backwards compatibility on the current proposed
> > > options
> > > > > > > with
> > > > > > > > > simple key mapping.
> > > > > > > > > - For the GPU Manager, if later needed we can change it to
> a
> > > > > > > "Extended
> > > > > > > > > Resource Manager" (or whatever it is called). That should
> be
> > a
> > > > > pure
> > > > > > > > > component-internal refactoring.
> > > > > > > > > - For ResourceProfile and ResourceSpec, there are already
> > > > > fields for
> > > > > > > > > general extended resource. We can of course leverage them
> > when
> > > > > > > > > supporting
> > > > > > > > > fine grained GPU scheduling. That is also not in the scope
> of
> > > > > this
> > > > > > > first
> > > > > > > > > step proposal, and would require FLIP-56 to be finished
> > first.
> > > > > > > > >
> > > > > > > > > To summary up, I agree with Becket that have a separate
> FLIP
> > > for
> > > > > the
> > > > > > > > > general extended resource mechanism, and keep it in mind
> when
> > > > > > > discussing
> > > > > > > > > and implementing the current one.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > >
> > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > > becket.qin@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > That's a good point, Stephan. It makes total sense to
> > > generalize
> > > > > the
> > > > > > > > > > resource management to support custom resources. Having
> > that
> > > > > allows
> > > > > > > users
> > > > > > > > > > to add new resources by themselves. The general resource
> > > > > management
> > > > > > > may
> > > > > > > > > > involve two different aspects:
> > > > > > > > > >
> > > > > > > > > > 1. The custom resource type definition. It is supported
> by
> > > the
> > > > > > > extended
> > > > > > > > > > resources in ResourceProfile and ResourceSpec. This will
> > > likely
> > > > > cover
> > > > > > > > > > majority of the cases.
> > > > > > > > > >
> > > > > > > > > > 2. The custom resource allocation logic, i.e. how to
> assign
> > > the
> > > > > > > resources
> > > > > > > > > > to different tasks, operators, and so on. This may
> require
> > > two
> > > > > > > levels /
> > > > > > > > > > steps:
> > > > > > > > > > a. Subtask level - make sure the subtasks are put into
> > > > > suitable
> > > > > > > > > slots.
> > > > > > > > > > It is done by the global RM and is not customizable right
> > > now.
> > > > > > > > > > b. Operator level - map the exact resource to the
> operators
> > > > > in
> > > > > > > TM.
> > > > > > > > > e.g.
> > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is
> > > needed
> > > > > > > assuming
> > > > > > > > > > the global RM does not distinguish individual resources
> of
> > > the
> > > > > same
> > > > > > > type.
> > > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > > >
> > > > > > > > > > The GPU manager is designed to do 2.b here. So it should
> > > > > discover the
> > > > > > > > > > physical GPU information and bind/match them to each
> > > operators.
> > > > > > > Making
> > > > > > > > > this
> > > > > > > > > > general will fill in the missing piece to support custom
> > > resource
> > > > > > > type
> > > > > > > > > > definition. But I'd avoid calling it a "External Resource
> > > > > Manager" to
> > > > > > > > > avoid
> > > > > > > > > > confusion with RM, maybe something like "Operator
> Resource
> > > > > Assigner"
> > > > > > > > > would
> > > > > > > > > > be more accurate. So for each resource type users can
> have
> > an
> > > > > > > optional
> > > > > > > > > > "Operator Resource Assigner" in the TM. For memory, users
> > > don't
> > > > > need
> > > > > > > > > this,
> > > > > > > > > > but for other extended resources, users may need that.
> > > > > > > > > >
> > > > > > > > > > Personally I think a pluggable "Operator Resource
> Assigner"
> > > is
> > > > > > > achievable
> > > > > > > > > > in this FLIP. But I am also OK with having that in a
> > separate
> > > > > FLIP
> > > > > > > > > because
> > > > > > > > > > the interface between the "Operator Resource Assigner"
> and
> > > > > operator
> > > > > > > may
> > > > > > > > > > take a while to settle down if we want to make it
> generic.
> > > But I
> > > > > > > think
> > > > > > > > > our
> > > > > > > > > > implementation should take this future work into
> > > consideration so
> > > > > > > that we
> > > > > > > > > > don't need to break backwards compatibility once we have
> > > that.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > > sewen@apache.org>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > > >
> > > > > > > > > > > I cannot really give much input into the mechanics of
> > > GPU-aware
> > > > > > > > > > scheduling
> > > > > > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > > > > > >
> > > > > > > > > > > One thought I had when reading the proposal is if it
> > makes
> > > > > sense to
> > > > > > > > > look
> > > > > > > > > > at
> > > > > > > > > > > the "GPU Manager" as an "External Resource Manager",
> and
> > > GPU
> > > > > is one
> > > > > > > > > such
> > > > > > > > > > > resource.
> > > > > > > > > > > The way I understand the ResourceProfile and
> > ResourceSpec,
> > > > > that is
> > > > > > > how
> > > > > > > > > it
> > > > > > > > > > > is done there.
> > > > > > > > > > > It has the advantage that it looks more extensible.
> Maybe
> > > > > there is
> > > > > > > a
> > > > > > > > > GPU
> > > > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> > > > > Resource, a
> > > > > > > > > Alibaba
> > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Stephan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > > becket.qin@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource management
> > > support
> > > > > is a
> > > > > > > > > > > must-have
> > > > > > > > > > > > for machine learning use cases. Actually it is one of
> > the
> > > > > mostly
> > > > > > > > > asked
> > > > > > > > > > > > question from the users who are interested in using
> > Flink
> > > > > for ML.
> > > > > > > > > > > >
> > > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > > 1. The WebUI / REST API should probably also be
> > > mentioned in
> > > > > the
> > > > > > > > > public
> > > > > > > > > > > > interface section.
> > > > > > > > > > > > 2. Is the data structure that holds GPU info also a
> > > public
> > > > > API?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > > > tonysong820@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for drafting the FLIP and kicking off the
> > > > > discussion,
> > > > > > > > > Yangze.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Big +1 for this feature. Supporting using of GPU in
> > > Flink
> > > > > is
> > > > > > > > > > > significant,
> > > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good
> to
> > > me. I
> > > > > > > think
> > > > > > > > > > it's a
> > > > > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > >
> > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > > > karmagyz@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We would like to start a discussion thread on
> > > "FLIP-108:
> > > > > Add
> > > > > > > GPU
> > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Enable user to configure how many GPUs in a
> task
> > > > > executor
> > > > > > > and
> > > > > > > > > > > > > > forward such requirements to the external
> resource
> > > > > managers
> > > > > > > (for
> > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > > - Provide information of available GPU resources
> to
> > > > > > > operators.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Forward GPU resource requirements to
> > > Yarn/Kubernetes.
> > > > > > > > > > > > > > - Introduce GPUManager as one of the task manager
> > > > > services to
> > > > > > > > > > > discover
> > > > > > > > > > > > > > and expose GPU resource information to the
> context
> > of
> > > > > > > functions.
> > > > > > > > > > > > > > - Introduce the default script for GPU discovery,
> > in
> > > > > which we
> > > > > > > > > > provide
> > > > > > > > > > > > > > the privilege mode to help user to achieve
> > > worker-level
> > > > > > > isolation
> > > > > > > > > > in
> > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Please find more details in the FLIP wiki
> document
> > > [1].
> > > > > > > Looking
> > > > > > > > > > > forward
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Becket Qin <be...@gmail.com>.
It probably make sense for us to first agree on the final state. More
specifically, will the resource info be exposed through runtime context
eventually?

If that is the final state and we have a seamless migration story from this
FLIP to that final state, Personally I think it is OK to expose the GPU
info in the runtime context.

Thanks,

Jiangjie (Becket) Qin

On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <to...@gmail.com> wrote:

> @Yangze,
> I think what Stephan means (@Stephan, please correct me if I'm wrong) is
> that, we might not need to hold and maintain the GPUManager as a service in
> TaskManagerServices or RuntimeContext. An alternative is to create /
> retrieve the GPUManager only in the operators that need it, e.g., with a
> static method `GPUManager.get()`.
>
> @Stephan,
> I agree with you on excluding GPUManager from TaskManagerServices.
>
>    - For the first step, where we provide unified TM-level GPU information
>    to all operators, it should be fine to have operators access /
>    lazy-initiate GPUManager by themselves.
>    - In future, we might have some more fine-grained GPU management, where
>    we need to maintain GPUManager as a service and put GPU info in slot
>    profiles. But at least for now it's not necessary to introduce such
>    complexity.
>
> However, I have some concerns on excluding GPUManager from RuntimeContext
> and let operators access it directly.
>
>    - Configurations needed for creating the GPUManager is not always
>    available for operators.
>    - If later we want to have fine-grained control over GPU (e.g.,
>    operators in each slot can only see GPUs reserved for that slot), the
>    approach cannot be easily extended.
>
> I would suggest to wrap the GPUManager behind RuntimeContext and only
> expose the GPUInfo to users. For now, we can declare a method
> `getGPUInfo()` in RuntimeContext, with a default definition that calls
> `GPUManager.get()` to get the lazily-created GPUManager. If later we want
> to create / retrieve GPUManager in a different way, we can simply change
> how `getGPUInfo` is implemented, without needing to change any public
> interfaces.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <ka...@gmail.com> wrote:
>
> > @Shephan
> > Do you mean Minicluster? Yes, it makes sense to share the GPU Manager
> > in such scenario.
> > If that's what you worry about, I'm +1 for holding
> > GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> > TaskManagerServices.
> >
> > Regarding the RuntimeContext/FunctionContext, it just holds the GPU
> > info instead of the GPU Manager. AFAIK, it's the only place we could
> > pass GPU info to the RichFunction/UserDefinedFunction.
> >
> > Best,
> > Yangze Guo
> >
> > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <is...@paddlesoft.net>
> > wrote:
> > >
> > >
> > >
> > >
> > >
> > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org wrote ----
> > >
> > > > > Can we somehow keep this out of the TaskManager services
> > > > I fear that we could not. IMO, the GPUManager(or
> > > > ExternalServicesManagers in future) is conceptually one of the task
> > > > manager services, just like MemoryManager before 1.10.
> > > > - It maintains/holds the GPU resource at TM level and all of the
> > > > operators allocate the GPU resources from it. So, it should be
> > > > exclusive to a single TaskExecutor.
> > > > - We could add a collection called ExternalResourceManagers to hold
> > > > all managers of other external resources in the future.
> > > >
> > >
> > > Can you help me understand why this needs the addition in
> > TaskMagerServices
> > > or in the RuntimeContext?
> > > Are you worried about the case when multiple Task Executors run in the
> > same
> > > JVM? That's not common, but wouldn't it actually be good in that case
> to
> > > share the GPU Manager, given that the GPU is shared?
> > >
> > > Thanks,
> > > Stephan
> > >
> > > ---------------------------
> > >
> > >
> > > > What parts need information about this?
> > > > In this FLIP, operators need the information. Thus, we expose GPU
> > > > information to the RuntimeContext/FunctionContext. The slot profile
> is
> > > > not aware of GPU resources as GPU is TM level resource now.
> > > >
> > > > > Can the GPU Manager be a "self contained" thing that simply takes
> the
> > > > configuration, and then abstracts everything internally?
> > > > Yes, we just pass the path/args of the discover script and how many
> > > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > > information and expose them to the RuntimeContext/FunctionContext of
> > > > Operators. Meanwhile, we'd better not allow operators to directly
> > > > access GPUManager, it should get what they want from Context. We
> could
> > > > then decouple the interface/implementation of GPUManager and Public
> > > > API.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <se...@apache.org>
> wrote:
> > > > >
> > > > > It sounds fine to initially start with GPU specific support and
> think
> > > > about
> > > > > generalizing this once we better understand the space.
> > > > >
> > > > > About the implementation suggested in FLIP-108:
> > > > > - Can we somehow keep this out of the TaskManager services?
> Anything
> > we
> > > > > have to pull through all layers of the TM makes the TM components
> yet
> > > > more
> > > > > complex and harder to maintain.
> > > > >
> > > > > - What parts need information about this?
> > > > > -> do the slot profiles need information about the GPU?
> > > > > -> Can the GPU Manager be a "self contained" thing that simply
> takes
> > > > > the configuration, and then abstracts everything internally?
> > Operators
> > > > can
> > > > > access it via "GPUManager.get()" or so?
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <ka...@gmail.com>
> > wrote:
> > > > >
> > > > > > Thanks for all the feedbacks.
> > > > > >
> > > > > > @Becket
> > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add them to
> the
> > > > > > Public API section.
> > > > > >
> > > > > >
> > > > > > @Stephan @Becket
> > > > > > Regarding the general extended resource mechanism, I second
> > Xintong's
> > > > > > suggestion.
> > > > > > - It's better to leverage ResourceProfile and ResourceSpec after
> we
> > > > > > supporting fine-grained GPU scheduling. As a first step
> proposal, I
> > > > > > prefer to not include it in the scope of this FLIP.
> > > > > > - Regarding the "Extended Resource Manager", if I understand
> > > > > > correctly, it just a code refactoring atm, we could extract the
> > > > > > open/close/allocateExtendResources of GPUManager to that
> > interface. If
> > > > > > that is the case, +1 to do it during implementation.
> > > > > >
> > > > > > @Xingbo
> > > > > > As Xintong said, we looked into how Spark supports a general
> > "Custom
> > > > > > Resource Scheduling" before and decided to introduce a common
> > resource
> > > > > > configuration
> > > > > >
> schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > to make it more extensible. I think the "resource" is a proper
> > level
> > > > > > to contain all the configs of extended resources.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <hxbks2ks@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > >
> > > > > > > There is no doubt that GPU resource management support will
> > greatly
> > > > > > > facilitate the development of AI-related applications by
> PyFlink
> > > > users.
> > > > > > >
> > > > > > > I have only one comment about this wiki:
> > > > > > >
> > > > > > > Regarding the names of several GPU configurations, I think it
> is
> > > > better
> > > > > > to
> > > > > > > delete the resource field makes it consistent with the names of
> > other
> > > > > > > resource-related configurations in TaskManagerOption.
> > > > > > >
> > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Xingbo
> > > > > > >
> > > > > > >
> > > > > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三 上午10:39写道:
> > > > > > >
> > > > > > > > @Stephan, @Becket,
> > > > > > > >
> > > > > > > > Actually, Yangze, Yang and I also had an offline discussion
> > about
> > > > > > making
> > > > > > > > the "GPU Support" as some general "Extended Resource
> Support".
> > We
> > > > > > believe
> > > > > > > > supporting extended resources in a general mechanism is
> > definitely
> > > > a
> > > > > > good
> > > > > > > > and extensible way. The reason we propose this FLIP narrowing
> > its
> > > > scope
> > > > > > > > down to GPU alone, is mainly for the concern on extra efforts
> > and
> > > > > > review
> > > > > > > > capacity needed for a general mechanism.
> > > > > > > >
> > > > > > > > To come up with a well design on a general extended resource
> > > > management
> > > > > > > > mechanism, we would need to investigate more on how people
> use
> > > > > > different
> > > > > > > > kind of resources in practice. For GPU, we learnt such
> > knowledge
> > > > from
> > > > > > the
> > > > > > > > experts, Becket and his team members. But for FPGA, or other
> > > > potential
> > > > > > > > extended resources, we don't have such convenient information
> > > > sources,
> > > > > > > > making the investigation requires more efforts, which I tend
> to
> > > > think
> > > > > > is
> > > > > > > > not necessary atm.
> > > > > > > >
> > > > > > > > On the other hand, we also looked into how Spark supports a
> > general
> > > > > > "Custom
> > > > > > > > Resource Scheduling". Assuming we want to have a similar
> > general
> > > > > > extended
> > > > > > > > resource mechanism in the future, we believe that the current
> > GPU
> > > > > > support
> > > > > > > > design can be easily extended, in an incremental way without
> > too
> > > > many
> > > > > > > > reworks.
> > > > > > > >
> > > > > > > > - The most important part is probably user interfaces. Spark
> > > > offers
> > > > > > > > configuration options to define the amount, discovery script
> > and
> > > > > > vendor
> > > > > > > > (on
> > > > > > > > k8s) in a per resource type bias [1], which is very similar
> to
> > > > what
> > > > > > we
> > > > > > > > proposed in this FLIP. I think it's not necessary to expose
> > > > config
> > > > > > > > options
> > > > > > > > in the general way atm, since we do not have supports for
> other
> > > > > > resource
> > > > > > > > types now. If later we decided to have per resource type
> config
> > > > > > > > options, we
> > > > > > > > can have backwards compatibility on the current proposed
> > options
> > > > > > with
> > > > > > > > simple key mapping.
> > > > > > > > - For the GPU Manager, if later needed we can change it to a
> > > > > > "Extended
> > > > > > > > Resource Manager" (or whatever it is called). That should be
> a
> > > > pure
> > > > > > > > component-internal refactoring.
> > > > > > > > - For ResourceProfile and ResourceSpec, there are already
> > > > fields for
> > > > > > > > general extended resource. We can of course leverage them
> when
> > > > > > > > supporting
> > > > > > > > fine grained GPU scheduling. That is also not in the scope of
> > > > this
> > > > > > first
> > > > > > > > step proposal, and would require FLIP-56 to be finished
> first.
> > > > > > > >
> > > > > > > > To summary up, I agree with Becket that have a separate FLIP
> > for
> > > > the
> > > > > > > > general extended resource mechanism, and keep it in mind when
> > > > > > discussing
> > > > > > > > and implementing the current one.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > >
> > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> > becket.qin@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > That's a good point, Stephan. It makes total sense to
> > generalize
> > > > the
> > > > > > > > > resource management to support custom resources. Having
> that
> > > > allows
> > > > > > users
> > > > > > > > > to add new resources by themselves. The general resource
> > > > management
> > > > > > may
> > > > > > > > > involve two different aspects:
> > > > > > > > >
> > > > > > > > > 1. The custom resource type definition. It is supported by
> > the
> > > > > > extended
> > > > > > > > > resources in ResourceProfile and ResourceSpec. This will
> > likely
> > > > cover
> > > > > > > > > majority of the cases.
> > > > > > > > >
> > > > > > > > > 2. The custom resource allocation logic, i.e. how to assign
> > the
> > > > > > resources
> > > > > > > > > to different tasks, operators, and so on. This may require
> > two
> > > > > > levels /
> > > > > > > > > steps:
> > > > > > > > > a. Subtask level - make sure the subtasks are put into
> > > > suitable
> > > > > > > > slots.
> > > > > > > > > It is done by the global RM and is not customizable right
> > now.
> > > > > > > > > b. Operator level - map the exact resource to the operators
> > > > in
> > > > > > TM.
> > > > > > > > e.g.
> > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is
> > needed
> > > > > > assuming
> > > > > > > > > the global RM does not distinguish individual resources of
> > the
> > > > same
> > > > > > type.
> > > > > > > > > It is true for memory, but not for GPU.
> > > > > > > > >
> > > > > > > > > The GPU manager is designed to do 2.b here. So it should
> > > > discover the
> > > > > > > > > physical GPU information and bind/match them to each
> > operators.
> > > > > > Making
> > > > > > > > this
> > > > > > > > > general will fill in the missing piece to support custom
> > resource
> > > > > > type
> > > > > > > > > definition. But I'd avoid calling it a "External Resource
> > > > Manager" to
> > > > > > > > avoid
> > > > > > > > > confusion with RM, maybe something like "Operator Resource
> > > > Assigner"
> > > > > > > > would
> > > > > > > > > be more accurate. So for each resource type users can have
> an
> > > > > > optional
> > > > > > > > > "Operator Resource Assigner" in the TM. For memory, users
> > don't
> > > > need
> > > > > > > > this,
> > > > > > > > > but for other extended resources, users may need that.
> > > > > > > > >
> > > > > > > > > Personally I think a pluggable "Operator Resource Assigner"
> > is
> > > > > > achievable
> > > > > > > > > in this FLIP. But I am also OK with having that in a
> separate
> > > > FLIP
> > > > > > > > because
> > > > > > > > > the interface between the "Operator Resource Assigner" and
> > > > operator
> > > > > > may
> > > > > > > > > take a while to settle down if we want to make it generic.
> > But I
> > > > > > think
> > > > > > > > our
> > > > > > > > > implementation should take this future work into
> > consideration so
> > > > > > that we
> > > > > > > > > don't need to break backwards compatibility once we have
> > that.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > >
> > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> > sewen@apache.org>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > > >
> > > > > > > > > > I cannot really give much input into the mechanics of
> > GPU-aware
> > > > > > > > > scheduling
> > > > > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > > > > >
> > > > > > > > > > One thought I had when reading the proposal is if it
> makes
> > > > sense to
> > > > > > > > look
> > > > > > > > > at
> > > > > > > > > > the "GPU Manager" as an "External Resource Manager", and
> > GPU
> > > > is one
> > > > > > > > such
> > > > > > > > > > resource.
> > > > > > > > > > The way I understand the ResourceProfile and
> ResourceSpec,
> > > > that is
> > > > > > how
> > > > > > > > it
> > > > > > > > > > is done there.
> > > > > > > > > > It has the advantage that it looks more extensible. Maybe
> > > > there is
> > > > > > a
> > > > > > > > GPU
> > > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> > > > Resource, a
> > > > > > > > Alibaba
> > > > > > > > > > TPU Resource, etc.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stephan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > > becket.qin@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource management
> > support
> > > > is a
> > > > > > > > > > must-have
> > > > > > > > > > > for machine learning use cases. Actually it is one of
> the
> > > > mostly
> > > > > > > > asked
> > > > > > > > > > > question from the users who are interested in using
> Flink
> > > > for ML.
> > > > > > > > > > >
> > > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > > 1. The WebUI / REST API should probably also be
> > mentioned in
> > > > the
> > > > > > > > public
> > > > > > > > > > > interface section.
> > > > > > > > > > > 2. Is the data structure that holds GPU info also a
> > public
> > > > API?
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > > tonysong820@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for drafting the FLIP and kicking off the
> > > > discussion,
> > > > > > > > Yangze.
> > > > > > > > > > > >
> > > > > > > > > > > > Big +1 for this feature. Supporting using of GPU in
> > Flink
> > > > is
> > > > > > > > > > significant,
> > > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to
> > me. I
> > > > > > think
> > > > > > > > > it's a
> > > > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > > karmagyz@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > >
> > > > > > > > > > > > > We would like to start a discussion thread on
> > "FLIP-108:
> > > > Add
> > > > > > GPU
> > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > >
> > > > > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Enable user to configure how many GPUs in a task
> > > > executor
> > > > > > and
> > > > > > > > > > > > > forward such requirements to the external resource
> > > > managers
> > > > > > (for
> > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > > - Provide information of available GPU resources to
> > > > > > operators.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Forward GPU resource requirements to
> > Yarn/Kubernetes.
> > > > > > > > > > > > > - Introduce GPUManager as one of the task manager
> > > > services to
> > > > > > > > > > discover
> > > > > > > > > > > > > and expose GPU resource information to the context
> of
> > > > > > functions.
> > > > > > > > > > > > > - Introduce the default script for GPU discovery,
> in
> > > > which we
> > > > > > > > > provide
> > > > > > > > > > > > > the privilege mode to help user to achieve
> > worker-level
> > > > > > isolation
> > > > > > > > > in
> > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please find more details in the FLIP wiki document
> > [1].
> > > > > > Looking
> > > > > > > > > > forward
> > > > > > > > > > > > to
> > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Xintong Song <to...@gmail.com>.
@Yangze,
I think what Stephan means (@Stephan, please correct me if I'm wrong) is
that, we might not need to hold and maintain the GPUManager as a service in
TaskManagerServices or RuntimeContext. An alternative is to create /
retrieve the GPUManager only in the operators that need it, e.g., with a
static method `GPUManager.get()`.

@Stephan,
I agree with you on excluding GPUManager from TaskManagerServices.

   - For the first step, where we provide unified TM-level GPU information
   to all operators, it should be fine to have operators access /
   lazy-initiate GPUManager by themselves.
   - In future, we might have some more fine-grained GPU management, where
   we need to maintain GPUManager as a service and put GPU info in slot
   profiles. But at least for now it's not necessary to introduce such
   complexity.

However, I have some concerns on excluding GPUManager from RuntimeContext
and let operators access it directly.

   - Configurations needed for creating the GPUManager is not always
   available for operators.
   - If later we want to have fine-grained control over GPU (e.g.,
   operators in each slot can only see GPUs reserved for that slot), the
   approach cannot be easily extended.

I would suggest to wrap the GPUManager behind RuntimeContext and only
expose the GPUInfo to users. For now, we can declare a method
`getGPUInfo()` in RuntimeContext, with a default definition that calls
`GPUManager.get()` to get the lazily-created GPUManager. If later we want
to create / retrieve GPUManager in a different way, we can simply change
how `getGPUInfo` is implemented, without needing to change any public
interfaces.

Thank you~

Xintong Song



On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <ka...@gmail.com> wrote:

> @Shephan
> Do you mean Minicluster? Yes, it makes sense to share the GPU Manager
> in such scenario.
> If that's what you worry about, I'm +1 for holding
> GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> TaskManagerServices.
>
> Regarding the RuntimeContext/FunctionContext, it just holds the GPU
> info instead of the GPU Manager. AFAIK, it's the only place we could
> pass GPU info to the RichFunction/UserDefinedFunction.
>
> Best,
> Yangze Guo
>
> On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <is...@paddlesoft.net>
> wrote:
> >
> >
> >
> >
> >
> > ---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org wrote ----
> >
> > > > Can we somehow keep this out of the TaskManager services
> > > I fear that we could not. IMO, the GPUManager(or
> > > ExternalServicesManagers in future) is conceptually one of the task
> > > manager services, just like MemoryManager before 1.10.
> > > - It maintains/holds the GPU resource at TM level and all of the
> > > operators allocate the GPU resources from it. So, it should be
> > > exclusive to a single TaskExecutor.
> > > - We could add a collection called ExternalResourceManagers to hold
> > > all managers of other external resources in the future.
> > >
> >
> > Can you help me understand why this needs the addition in
> TaskMagerServices
> > or in the RuntimeContext?
> > Are you worried about the case when multiple Task Executors run in the
> same
> > JVM? That's not common, but wouldn't it actually be good in that case to
> > share the GPU Manager, given that the GPU is shared?
> >
> > Thanks,
> > Stephan
> >
> > ---------------------------
> >
> >
> > > What parts need information about this?
> > > In this FLIP, operators need the information. Thus, we expose GPU
> > > information to the RuntimeContext/FunctionContext. The slot profile is
> > > not aware of GPU resources as GPU is TM level resource now.
> > >
> > > > Can the GPU Manager be a "self contained" thing that simply takes the
> > > configuration, and then abstracts everything internally?
> > > Yes, we just pass the path/args of the discover script and how many
> > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > information and expose them to the RuntimeContext/FunctionContext of
> > > Operators. Meanwhile, we'd better not allow operators to directly
> > > access GPUManager, it should get what they want from Context. We could
> > > then decouple the interface/implementation of GPUManager and Public
> > > API.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <se...@apache.org> wrote:
> > > >
> > > > It sounds fine to initially start with GPU specific support and think
> > > about
> > > > generalizing this once we better understand the space.
> > > >
> > > > About the implementation suggested in FLIP-108:
> > > > - Can we somehow keep this out of the TaskManager services? Anything
> we
> > > > have to pull through all layers of the TM makes the TM components yet
> > > more
> > > > complex and harder to maintain.
> > > >
> > > > - What parts need information about this?
> > > > -> do the slot profiles need information about the GPU?
> > > > -> Can the GPU Manager be a "self contained" thing that simply takes
> > > > the configuration, and then abstracts everything internally?
> Operators
> > > can
> > > > access it via "GPUManager.get()" or so?
> > > >
> > > >
> > > >
> > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <ka...@gmail.com>
> wrote:
> > > >
> > > > > Thanks for all the feedbacks.
> > > > >
> > > > > @Becket
> > > > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > > > > Public API section.
> > > > >
> > > > >
> > > > > @Stephan @Becket
> > > > > Regarding the general extended resource mechanism, I second
> Xintong's
> > > > > suggestion.
> > > > > - It's better to leverage ResourceProfile and ResourceSpec after we
> > > > > supporting fine-grained GPU scheduling. As a first step proposal, I
> > > > > prefer to not include it in the scope of this FLIP.
> > > > > - Regarding the "Extended Resource Manager", if I understand
> > > > > correctly, it just a code refactoring atm, we could extract the
> > > > > open/close/allocateExtendResources of GPUManager to that
> interface. If
> > > > > that is the case, +1 to do it during implementation.
> > > > >
> > > > > @Xingbo
> > > > > As Xintong said, we looked into how Spark supports a general
> "Custom
> > > > > Resource Scheduling" before and decided to introduce a common
> resource
> > > > > configuration
> > > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > to make it more extensible. I think the "resource" is a proper
> level
> > > > > to contain all the configs of extended resources.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <hx...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > >
> > > > > > There is no doubt that GPU resource management support will
> greatly
> > > > > > facilitate the development of AI-related applications by PyFlink
> > > users.
> > > > > >
> > > > > > I have only one comment about this wiki:
> > > > > >
> > > > > > Regarding the names of several GPU configurations, I think it is
> > > better
> > > > > to
> > > > > > delete the resource field makes it consistent with the names of
> other
> > > > > > resource-related configurations in TaskManagerOption.
> > > > > >
> > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > taskmanager.gpu.discovery-script.path
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Xingbo
> > > > > >
> > > > > >
> > > > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三 上午10:39写道:
> > > > > >
> > > > > > > @Stephan, @Becket,
> > > > > > >
> > > > > > > Actually, Yangze, Yang and I also had an offline discussion
> about
> > > > > making
> > > > > > > the "GPU Support" as some general "Extended Resource Support".
> We
> > > > > believe
> > > > > > > supporting extended resources in a general mechanism is
> definitely
> > > a
> > > > > good
> > > > > > > and extensible way. The reason we propose this FLIP narrowing
> its
> > > scope
> > > > > > > down to GPU alone, is mainly for the concern on extra efforts
> and
> > > > > review
> > > > > > > capacity needed for a general mechanism.
> > > > > > >
> > > > > > > To come up with a well design on a general extended resource
> > > management
> > > > > > > mechanism, we would need to investigate more on how people use
> > > > > different
> > > > > > > kind of resources in practice. For GPU, we learnt such
> knowledge
> > > from
> > > > > the
> > > > > > > experts, Becket and his team members. But for FPGA, or other
> > > potential
> > > > > > > extended resources, we don't have such convenient information
> > > sources,
> > > > > > > making the investigation requires more efforts, which I tend to
> > > think
> > > > > is
> > > > > > > not necessary atm.
> > > > > > >
> > > > > > > On the other hand, we also looked into how Spark supports a
> general
> > > > > "Custom
> > > > > > > Resource Scheduling". Assuming we want to have a similar
> general
> > > > > extended
> > > > > > > resource mechanism in the future, we believe that the current
> GPU
> > > > > support
> > > > > > > design can be easily extended, in an incremental way without
> too
> > > many
> > > > > > > reworks.
> > > > > > >
> > > > > > > - The most important part is probably user interfaces. Spark
> > > offers
> > > > > > > configuration options to define the amount, discovery script
> and
> > > > > vendor
> > > > > > > (on
> > > > > > > k8s) in a per resource type bias [1], which is very similar to
> > > what
> > > > > we
> > > > > > > proposed in this FLIP. I think it's not necessary to expose
> > > config
> > > > > > > options
> > > > > > > in the general way atm, since we do not have supports for other
> > > > > resource
> > > > > > > types now. If later we decided to have per resource type config
> > > > > > > options, we
> > > > > > > can have backwards compatibility on the current proposed
> options
> > > > > with
> > > > > > > simple key mapping.
> > > > > > > - For the GPU Manager, if later needed we can change it to a
> > > > > "Extended
> > > > > > > Resource Manager" (or whatever it is called). That should be a
> > > pure
> > > > > > > component-internal refactoring.
> > > > > > > - For ResourceProfile and ResourceSpec, there are already
> > > fields for
> > > > > > > general extended resource. We can of course leverage them when
> > > > > > > supporting
> > > > > > > fine grained GPU scheduling. That is also not in the scope of
> > > this
> > > > > first
> > > > > > > step proposal, and would require FLIP-56 to be finished first.
> > > > > > >
> > > > > > > To summary up, I agree with Becket that have a separate FLIP
> for
> > > the
> > > > > > > general extended resource mechanism, and keep it in mind when
> > > > > discussing
> > > > > > > and implementing the current one.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > >
> > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> becket.qin@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > That's a good point, Stephan. It makes total sense to
> generalize
> > > the
> > > > > > > > resource management to support custom resources. Having that
> > > allows
> > > > > users
> > > > > > > > to add new resources by themselves. The general resource
> > > management
> > > > > may
> > > > > > > > involve two different aspects:
> > > > > > > >
> > > > > > > > 1. The custom resource type definition. It is supported by
> the
> > > > > extended
> > > > > > > > resources in ResourceProfile and ResourceSpec. This will
> likely
> > > cover
> > > > > > > > majority of the cases.
> > > > > > > >
> > > > > > > > 2. The custom resource allocation logic, i.e. how to assign
> the
> > > > > resources
> > > > > > > > to different tasks, operators, and so on. This may require
> two
> > > > > levels /
> > > > > > > > steps:
> > > > > > > > a. Subtask level - make sure the subtasks are put into
> > > suitable
> > > > > > > slots.
> > > > > > > > It is done by the global RM and is not customizable right
> now.
> > > > > > > > b. Operator level - map the exact resource to the operators
> > > in
> > > > > TM.
> > > > > > > e.g.
> > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is
> needed
> > > > > assuming
> > > > > > > > the global RM does not distinguish individual resources of
> the
> > > same
> > > > > type.
> > > > > > > > It is true for memory, but not for GPU.
> > > > > > > >
> > > > > > > > The GPU manager is designed to do 2.b here. So it should
> > > discover the
> > > > > > > > physical GPU information and bind/match them to each
> operators.
> > > > > Making
> > > > > > > this
> > > > > > > > general will fill in the missing piece to support custom
> resource
> > > > > type
> > > > > > > > definition. But I'd avoid calling it a "External Resource
> > > Manager" to
> > > > > > > avoid
> > > > > > > > confusion with RM, maybe something like "Operator Resource
> > > Assigner"
> > > > > > > would
> > > > > > > > be more accurate. So for each resource type users can have an
> > > > > optional
> > > > > > > > "Operator Resource Assigner" in the TM. For memory, users
> don't
> > > need
> > > > > > > this,
> > > > > > > > but for other extended resources, users may need that.
> > > > > > > >
> > > > > > > > Personally I think a pluggable "Operator Resource Assigner"
> is
> > > > > achievable
> > > > > > > > in this FLIP. But I am also OK with having that in a separate
> > > FLIP
> > > > > > > because
> > > > > > > > the interface between the "Operator Resource Assigner" and
> > > operator
> > > > > may
> > > > > > > > take a while to settle down if we want to make it generic.
> But I
> > > > > think
> > > > > > > our
> > > > > > > > implementation should take this future work into
> consideration so
> > > > > that we
> > > > > > > > don't need to break backwards compatibility once we have
> that.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> sewen@apache.org>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > >
> > > > > > > > > I cannot really give much input into the mechanics of
> GPU-aware
> > > > > > > > scheduling
> > > > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > > > >
> > > > > > > > > One thought I had when reading the proposal is if it makes
> > > sense to
> > > > > > > look
> > > > > > > > at
> > > > > > > > > the "GPU Manager" as an "External Resource Manager", and
> GPU
> > > is one
> > > > > > > such
> > > > > > > > > resource.
> > > > > > > > > The way I understand the ResourceProfile and ResourceSpec,
> > > that is
> > > > > how
> > > > > > > it
> > > > > > > > > is done there.
> > > > > > > > > It has the advantage that it looks more extensible. Maybe
> > > there is
> > > > > a
> > > > > > > GPU
> > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> > > Resource, a
> > > > > > > Alibaba
> > > > > > > > > TPU Resource, etc.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stephan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > becket.qin@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the FLIP Yangze. GPU resource management
> support
> > > is a
> > > > > > > > > must-have
> > > > > > > > > > for machine learning use cases. Actually it is one of the
> > > mostly
> > > > > > > asked
> > > > > > > > > > question from the users who are interested in using Flink
> > > for ML.
> > > > > > > > > >
> > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > 1. The WebUI / REST API should probably also be
> mentioned in
> > > the
> > > > > > > public
> > > > > > > > > > interface section.
> > > > > > > > > > 2. Is the data structure that holds GPU info also a
> public
> > > API?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > tonysong820@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for drafting the FLIP and kicking off the
> > > discussion,
> > > > > > > Yangze.
> > > > > > > > > > >
> > > > > > > > > > > Big +1 for this feature. Supporting using of GPU in
> Flink
> > > is
> > > > > > > > > significant,
> > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to
> me. I
> > > > > think
> > > > > > > > it's a
> > > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > karmagyz@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > >
> > > > > > > > > > > > We would like to start a discussion thread on
> "FLIP-108:
> > > Add
> > > > > GPU
> > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > >
> > > > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > > > >
> > > > > > > > > > > > - Enable user to configure how many GPUs in a task
> > > executor
> > > > > and
> > > > > > > > > > > > forward such requirements to the external resource
> > > managers
> > > > > (for
> > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > - Provide information of available GPU resources to
> > > > > operators.
> > > > > > > > > > > >
> > > > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > > > >
> > > > > > > > > > > > - Forward GPU resource requirements to
> Yarn/Kubernetes.
> > > > > > > > > > > > - Introduce GPUManager as one of the task manager
> > > services to
> > > > > > > > > discover
> > > > > > > > > > > > and expose GPU resource information to the context of
> > > > > functions.
> > > > > > > > > > > > - Introduce the default script for GPU discovery, in
> > > which we
> > > > > > > > provide
> > > > > > > > > > > > the privilege mode to help user to achieve
> worker-level
> > > > > isolation
> > > > > > > > in
> > > > > > > > > > > > standalone mode.
> > > > > > > > > > > >
> > > > > > > > > > > > Please find more details in the FLIP wiki document
> [1].
> > > > > Looking
> > > > > > > > > forward
> > > > > > > > > > > to
> > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Yangze Guo <ka...@gmail.com>.
@Shephan
Do you mean Minicluster? Yes, it makes sense to share the GPU Manager
in such scenario.
If that's what you worry about, I'm +1 for holding
GPUManager(ExternalResourceManagers) in TaskExecutor instead of
TaskManagerServices.

Regarding the RuntimeContext/FunctionContext, it just holds the GPU
info instead of the GPU Manager. AFAIK, it's the only place we could
pass GPU info to the RichFunction/UserDefinedFunction.

Best,
Yangze Guo

On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <is...@paddlesoft.net> wrote:
>
>
>
>
>
> ---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org wrote ----
>
> > > Can we somehow keep this out of the TaskManager services
> > I fear that we could not. IMO, the GPUManager(or
> > ExternalServicesManagers in future) is conceptually one of the task
> > manager services, just like MemoryManager before 1.10.
> > - It maintains/holds the GPU resource at TM level and all of the
> > operators allocate the GPU resources from it. So, it should be
> > exclusive to a single TaskExecutor.
> > - We could add a collection called ExternalResourceManagers to hold
> > all managers of other external resources in the future.
> >
>
> Can you help me understand why this needs the addition in TaskMagerServices
> or in the RuntimeContext?
> Are you worried about the case when multiple Task Executors run in the same
> JVM? That's not common, but wouldn't it actually be good in that case to
> share the GPU Manager, given that the GPU is shared?
>
> Thanks,
> Stephan
>
> ---------------------------
>
>
> > What parts need information about this?
> > In this FLIP, operators need the information. Thus, we expose GPU
> > information to the RuntimeContext/FunctionContext. The slot profile is
> > not aware of GPU resources as GPU is TM level resource now.
> >
> > > Can the GPU Manager be a "self contained" thing that simply takes the
> > configuration, and then abstracts everything internally?
> > Yes, we just pass the path/args of the discover script and how many
> > GPUs per TM to it. It takes the responsibility to get the GPU
> > information and expose them to the RuntimeContext/FunctionContext of
> > Operators. Meanwhile, we'd better not allow operators to directly
> > access GPUManager, it should get what they want from Context. We could
> > then decouple the interface/implementation of GPUManager and Public
> > API.
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <se...@apache.org> wrote:
> > >
> > > It sounds fine to initially start with GPU specific support and think
> > about
> > > generalizing this once we better understand the space.
> > >
> > > About the implementation suggested in FLIP-108:
> > > - Can we somehow keep this out of the TaskManager services? Anything we
> > > have to pull through all layers of the TM makes the TM components yet
> > more
> > > complex and harder to maintain.
> > >
> > > - What parts need information about this?
> > > -> do the slot profiles need information about the GPU?
> > > -> Can the GPU Manager be a "self contained" thing that simply takes
> > > the configuration, and then abstracts everything internally? Operators
> > can
> > > access it via "GPUManager.get()" or so?
> > >
> > >
> > >
> > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <ka...@gmail.com> wrote:
> > >
> > > > Thanks for all the feedbacks.
> > > >
> > > > @Becket
> > > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > > > Public API section.
> > > >
> > > >
> > > > @Stephan @Becket
> > > > Regarding the general extended resource mechanism, I second Xintong's
> > > > suggestion.
> > > > - It's better to leverage ResourceProfile and ResourceSpec after we
> > > > supporting fine-grained GPU scheduling. As a first step proposal, I
> > > > prefer to not include it in the scope of this FLIP.
> > > > - Regarding the "Extended Resource Manager", if I understand
> > > > correctly, it just a code refactoring atm, we could extract the
> > > > open/close/allocateExtendResources of GPUManager to that interface. If
> > > > that is the case, +1 to do it during implementation.
> > > >
> > > > @Xingbo
> > > > As Xintong said, we looked into how Spark supports a general "Custom
> > > > Resource Scheduling" before and decided to introduce a common resource
> > > > configuration
> > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > to make it more extensible. I think the "resource" is a proper level
> > > > to contain all the configs of extended resources.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <hx...@gmail.com>
> > wrote:
> > > > >
> > > > > Thanks a lot for the FLIP, Yangze.
> > > > >
> > > > > There is no doubt that GPU resource management support will greatly
> > > > > facilitate the development of AI-related applications by PyFlink
> > users.
> > > > >
> > > > > I have only one comment about this wiki:
> > > > >
> > > > > Regarding the names of several GPU configurations, I think it is
> > better
> > > > to
> > > > > delete the resource field makes it consistent with the names of other
> > > > > resource-related configurations in TaskManagerOption.
> > > > >
> > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > taskmanager.gpu.discovery-script.path
> > > > >
> > > > > Best,
> > > > >
> > > > > Xingbo
> > > > >
> > > > >
> > > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三 上午10:39写道:
> > > > >
> > > > > > @Stephan, @Becket,
> > > > > >
> > > > > > Actually, Yangze, Yang and I also had an offline discussion about
> > > > making
> > > > > > the "GPU Support" as some general "Extended Resource Support". We
> > > > believe
> > > > > > supporting extended resources in a general mechanism is definitely
> > a
> > > > good
> > > > > > and extensible way. The reason we propose this FLIP narrowing its
> > scope
> > > > > > down to GPU alone, is mainly for the concern on extra efforts and
> > > > review
> > > > > > capacity needed for a general mechanism.
> > > > > >
> > > > > > To come up with a well design on a general extended resource
> > management
> > > > > > mechanism, we would need to investigate more on how people use
> > > > different
> > > > > > kind of resources in practice. For GPU, we learnt such knowledge
> > from
> > > > the
> > > > > > experts, Becket and his team members. But for FPGA, or other
> > potential
> > > > > > extended resources, we don't have such convenient information
> > sources,
> > > > > > making the investigation requires more efforts, which I tend to
> > think
> > > > is
> > > > > > not necessary atm.
> > > > > >
> > > > > > On the other hand, we also looked into how Spark supports a general
> > > > "Custom
> > > > > > Resource Scheduling". Assuming we want to have a similar general
> > > > extended
> > > > > > resource mechanism in the future, we believe that the current GPU
> > > > support
> > > > > > design can be easily extended, in an incremental way without too
> > many
> > > > > > reworks.
> > > > > >
> > > > > > - The most important part is probably user interfaces. Spark
> > offers
> > > > > > configuration options to define the amount, discovery script and
> > > > vendor
> > > > > > (on
> > > > > > k8s) in a per resource type bias [1], which is very similar to
> > what
> > > > we
> > > > > > proposed in this FLIP. I think it's not necessary to expose
> > config
> > > > > > options
> > > > > > in the general way atm, since we do not have supports for other
> > > > resource
> > > > > > types now. If later we decided to have per resource type config
> > > > > > options, we
> > > > > > can have backwards compatibility on the current proposed options
> > > > with
> > > > > > simple key mapping.
> > > > > > - For the GPU Manager, if later needed we can change it to a
> > > > "Extended
> > > > > > Resource Manager" (or whatever it is called). That should be a
> > pure
> > > > > > component-internal refactoring.
> > > > > > - For ResourceProfile and ResourceSpec, there are already
> > fields for
> > > > > > general extended resource. We can of course leverage them when
> > > > > > supporting
> > > > > > fine grained GPU scheduling. That is also not in the scope of
> > this
> > > > first
> > > > > > step proposal, and would require FLIP-56 to be finished first.
> > > > > >
> > > > > > To summary up, I agree with Becket that have a separate FLIP for
> > the
> > > > > > general extended resource mechanism, and keep it in mind when
> > > > discussing
> > > > > > and implementing the current one.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <be...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > That's a good point, Stephan. It makes total sense to generalize
> > the
> > > > > > > resource management to support custom resources. Having that
> > allows
> > > > users
> > > > > > > to add new resources by themselves. The general resource
> > management
> > > > may
> > > > > > > involve two different aspects:
> > > > > > >
> > > > > > > 1. The custom resource type definition. It is supported by the
> > > > extended
> > > > > > > resources in ResourceProfile and ResourceSpec. This will likely
> > cover
> > > > > > > majority of the cases.
> > > > > > >
> > > > > > > 2. The custom resource allocation logic, i.e. how to assign the
> > > > resources
> > > > > > > to different tasks, operators, and so on. This may require two
> > > > levels /
> > > > > > > steps:
> > > > > > > a. Subtask level - make sure the subtasks are put into
> > suitable
> > > > > > slots.
> > > > > > > It is done by the global RM and is not customizable right now.
> > > > > > > b. Operator level - map the exact resource to the operators
> > in
> > > > TM.
> > > > > > e.g.
> > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> > > > assuming
> > > > > > > the global RM does not distinguish individual resources of the
> > same
> > > > type.
> > > > > > > It is true for memory, but not for GPU.
> > > > > > >
> > > > > > > The GPU manager is designed to do 2.b here. So it should
> > discover the
> > > > > > > physical GPU information and bind/match them to each operators.
> > > > Making
> > > > > > this
> > > > > > > general will fill in the missing piece to support custom resource
> > > > type
> > > > > > > definition. But I'd avoid calling it a "External Resource
> > Manager" to
> > > > > > avoid
> > > > > > > confusion with RM, maybe something like "Operator Resource
> > Assigner"
> > > > > > would
> > > > > > > be more accurate. So for each resource type users can have an
> > > > optional
> > > > > > > "Operator Resource Assigner" in the TM. For memory, users don't
> > need
> > > > > > this,
> > > > > > > but for other extended resources, users may need that.
> > > > > > >
> > > > > > > Personally I think a pluggable "Operator Resource Assigner" is
> > > > achievable
> > > > > > > in this FLIP. But I am also OK with having that in a separate
> > FLIP
> > > > > > because
> > > > > > > the interface between the "Operator Resource Assigner" and
> > operator
> > > > may
> > > > > > > take a while to settle down if we want to make it generic. But I
> > > > think
> > > > > > our
> > > > > > > implementation should take this future work into consideration so
> > > > that we
> > > > > > > don't need to break backwards compatibility once we have that.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <se...@apache.org>
> > > > wrote:
> > > > > > >
> > > > > > > > Thank you for writing this FLIP.
> > > > > > > >
> > > > > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > > > > scheduling
> > > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > > >
> > > > > > > > One thought I had when reading the proposal is if it makes
> > sense to
> > > > > > look
> > > > > > > at
> > > > > > > > the "GPU Manager" as an "External Resource Manager", and GPU
> > is one
> > > > > > such
> > > > > > > > resource.
> > > > > > > > The way I understand the ResourceProfile and ResourceSpec,
> > that is
> > > > how
> > > > > > it
> > > > > > > > is done there.
> > > > > > > > It has the advantage that it looks more extensible. Maybe
> > there is
> > > > a
> > > > > > GPU
> > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> > Resource, a
> > > > > > Alibaba
> > > > > > > > TPU Resource, etc.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > becket.qin@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the FLIP Yangze. GPU resource management support
> > is a
> > > > > > > > must-have
> > > > > > > > > for machine learning use cases. Actually it is one of the
> > mostly
> > > > > > asked
> > > > > > > > > question from the users who are interested in using Flink
> > for ML.
> > > > > > > > >
> > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > 1. The WebUI / REST API should probably also be mentioned in
> > the
> > > > > > public
> > > > > > > > > interface section.
> > > > > > > > > 2. Is the data structure that holds GPU info also a public
> > API?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > >
> > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > tonysong820@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for drafting the FLIP and kicking off the
> > discussion,
> > > > > > Yangze.
> > > > > > > > > >
> > > > > > > > > > Big +1 for this feature. Supporting using of GPU in Flink
> > is
> > > > > > > > significant,
> > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> > > > think
> > > > > > > it's a
> > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > >
> > > > > > > > > > Thank you~
> > > > > > > > > >
> > > > > > > > > > Xintong Song
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > karmagyz@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi everyone,
> > > > > > > > > > >
> > > > > > > > > > > We would like to start a discussion thread on "FLIP-108:
> > Add
> > > > GPU
> > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > >
> > > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > > >
> > > > > > > > > > > - Enable user to configure how many GPUs in a task
> > executor
> > > > and
> > > > > > > > > > > forward such requirements to the external resource
> > managers
> > > > (for
> > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > - Provide information of available GPU resources to
> > > > operators.
> > > > > > > > > > >
> > > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > > >
> > > > > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > > > > - Introduce GPUManager as one of the task manager
> > services to
> > > > > > > > discover
> > > > > > > > > > > and expose GPU resource information to the context of
> > > > functions.
> > > > > > > > > > > - Introduce the default script for GPU discovery, in
> > which we
> > > > > > > provide
> > > > > > > > > > > the privilege mode to help user to achieve worker-level
> > > > isolation
> > > > > > > in
> > > > > > > > > > > standalone mode.
> > > > > > > > > > >
> > > > > > > > > > > Please find more details in the FLIP wiki document [1].
> > > > Looking
> > > > > > > > forward
> > > > > > > > > > to
> > > > > > > > > > > your feedbacks.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Yangze Guo
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
>
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Isaac Godfried <is...@paddlesoft.net>.




---- On Fri, 13 Mar 2020 15:58:20 +0000 sewen@apache.org wrote ----


> > Can we somehow keep this out of the TaskManager services
> I fear that we could not. IMO, the GPUManager(or
> ExternalServicesManagers in future) is conceptually one of the task
> manager services, just like MemoryManager before 1.10.
> - It maintains/holds the GPU resource at TM level and all of the
> operators allocate the GPU resources from it. So, it should be
> exclusive to a single TaskExecutor.
> - We could add a collection called ExternalResourceManagers to hold
> all managers of other external resources in the future.
>

Can you help me understand why this needs the addition in TaskMagerServices
or in the RuntimeContext?
Are you worried about the case when multiple Task Executors run in the same
JVM? That's not common, but wouldn't it actually be good in that case to
share the GPU Manager, given that the GPU is shared?

Thanks,
Stephan

---------------------------


> What parts need information about this?
> In this FLIP, operators need the information. Thus, we expose GPU
> information to the RuntimeContext/FunctionContext. The slot profile is
> not aware of GPU resources as GPU is TM level resource now.
>
> > Can the GPU Manager be a "self contained" thing that simply takes the
> configuration, and then abstracts everything internally?
> Yes, we just pass the path/args of the discover script and how many
> GPUs per TM to it. It takes the responsibility to get the GPU
> information and expose them to the RuntimeContext/FunctionContext of
> Operators. Meanwhile, we'd better not allow operators to directly
> access GPUManager, it should get what they want from Context. We could
> then decouple the interface/implementation of GPUManager and Public
> API.
>
> Best,
> Yangze Guo
>
> On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <se...@apache.org> wrote:
> >
> > It sounds fine to initially start with GPU specific support and think
> about
> > generalizing this once we better understand the space.
> >
> > About the implementation suggested in FLIP-108:
> > - Can we somehow keep this out of the TaskManager services? Anything we
> > have to pull through all layers of the TM makes the TM components yet
> more
> > complex and harder to maintain.
> >
> > - What parts need information about this?
> > -> do the slot profiles need information about the GPU?
> > -> Can the GPU Manager be a "self contained" thing that simply takes
> > the configuration, and then abstracts everything internally? Operators
> can
> > access it via "GPUManager.get()" or so?
> >
> >
> >
> > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Thanks for all the feedbacks.
> > >
> > > @Becket
> > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > > Public API section.
> > >
> > >
> > > @Stephan @Becket
> > > Regarding the general extended resource mechanism, I second Xintong's
> > > suggestion.
> > > - It's better to leverage ResourceProfile and ResourceSpec after we
> > > supporting fine-grained GPU scheduling. As a first step proposal, I
> > > prefer to not include it in the scope of this FLIP.
> > > - Regarding the "Extended Resource Manager", if I understand
> > > correctly, it just a code refactoring atm, we could extract the
> > > open/close/allocateExtendResources of GPUManager to that interface. If
> > > that is the case, +1 to do it during implementation.
> > >
> > > @Xingbo
> > > As Xintong said, we looked into how Spark supports a general "Custom
> > > Resource Scheduling" before and decided to introduce a common resource
> > > configuration
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > to make it more extensible. I think the "resource" is a proper level
> > > to contain all the configs of extended resources.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <hx...@gmail.com>
> wrote:
> > > >
> > > > Thanks a lot for the FLIP, Yangze.
> > > >
> > > > There is no doubt that GPU resource management support will greatly
> > > > facilitate the development of AI-related applications by PyFlink
> users.
> > > >
> > > > I have only one comment about this wiki:
> > > >
> > > > Regarding the names of several GPU configurations, I think it is
> better
> > > to
> > > > delete the resource field makes it consistent with the names of other
> > > > resource-related configurations in TaskManagerOption.
> > > >
> > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > taskmanager.gpu.discovery-script.path
> > > >
> > > > Best,
> > > >
> > > > Xingbo
> > > >
> > > >
> > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三 上午10:39写道:
> > > >
> > > > > @Stephan, @Becket,
> > > > >
> > > > > Actually, Yangze, Yang and I also had an offline discussion about
> > > making
> > > > > the "GPU Support" as some general "Extended Resource Support". We
> > > believe
> > > > > supporting extended resources in a general mechanism is definitely
> a
> > > good
> > > > > and extensible way. The reason we propose this FLIP narrowing its
> scope
> > > > > down to GPU alone, is mainly for the concern on extra efforts and
> > > review
> > > > > capacity needed for a general mechanism.
> > > > >
> > > > > To come up with a well design on a general extended resource
> management
> > > > > mechanism, we would need to investigate more on how people use
> > > different
> > > > > kind of resources in practice. For GPU, we learnt such knowledge
> from
> > > the
> > > > > experts, Becket and his team members. But for FPGA, or other
> potential
> > > > > extended resources, we don't have such convenient information
> sources,
> > > > > making the investigation requires more efforts, which I tend to
> think
> > > is
> > > > > not necessary atm.
> > > > >
> > > > > On the other hand, we also looked into how Spark supports a general
> > > "Custom
> > > > > Resource Scheduling". Assuming we want to have a similar general
> > > extended
> > > > > resource mechanism in the future, we believe that the current GPU
> > > support
> > > > > design can be easily extended, in an incremental way without too
> many
> > > > > reworks.
> > > > >
> > > > > - The most important part is probably user interfaces. Spark
> offers
> > > > > configuration options to define the amount, discovery script and
> > > vendor
> > > > > (on
> > > > > k8s) in a per resource type bias [1], which is very similar to
> what
> > > we
> > > > > proposed in this FLIP. I think it's not necessary to expose
> config
> > > > > options
> > > > > in the general way atm, since we do not have supports for other
> > > resource
> > > > > types now. If later we decided to have per resource type config
> > > > > options, we
> > > > > can have backwards compatibility on the current proposed options
> > > with
> > > > > simple key mapping.
> > > > > - For the GPU Manager, if later needed we can change it to a
> > > "Extended
> > > > > Resource Manager" (or whatever it is called). That should be a
> pure
> > > > > component-internal refactoring.
> > > > > - For ResourceProfile and ResourceSpec, there are already
> fields for
> > > > > general extended resource. We can of course leverage them when
> > > > > supporting
> > > > > fine grained GPU scheduling. That is also not in the scope of
> this
> > > first
> > > > > step proposal, and would require FLIP-56 to be finished first.
> > > > >
> > > > > To summary up, I agree with Becket that have a separate FLIP for
> the
> > > > > general extended resource mechanism, and keep it in mind when
> > > discussing
> > > > > and implementing the current one.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > >
> > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > > > That's a good point, Stephan. It makes total sense to generalize
> the
> > > > > > resource management to support custom resources. Having that
> allows
> > > users
> > > > > > to add new resources by themselves. The general resource
> management
> > > may
> > > > > > involve two different aspects:
> > > > > >
> > > > > > 1. The custom resource type definition. It is supported by the
> > > extended
> > > > > > resources in ResourceProfile and ResourceSpec. This will likely
> cover
> > > > > > majority of the cases.
> > > > > >
> > > > > > 2. The custom resource allocation logic, i.e. how to assign the
> > > resources
> > > > > > to different tasks, operators, and so on. This may require two
> > > levels /
> > > > > > steps:
> > > > > > a. Subtask level - make sure the subtasks are put into
> suitable
> > > > > slots.
> > > > > > It is done by the global RM and is not customizable right now.
> > > > > > b. Operator level - map the exact resource to the operators
> in
> > > TM.
> > > > > e.g.
> > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> > > assuming
> > > > > > the global RM does not distinguish individual resources of the
> same
> > > type.
> > > > > > It is true for memory, but not for GPU.
> > > > > >
> > > > > > The GPU manager is designed to do 2.b here. So it should
> discover the
> > > > > > physical GPU information and bind/match them to each operators.
> > > Making
> > > > > this
> > > > > > general will fill in the missing piece to support custom resource
> > > type
> > > > > > definition. But I'd avoid calling it a "External Resource
> Manager" to
> > > > > avoid
> > > > > > confusion with RM, maybe something like "Operator Resource
> Assigner"
> > > > > would
> > > > > > be more accurate. So for each resource type users can have an
> > > optional
> > > > > > "Operator Resource Assigner" in the TM. For memory, users don't
> need
> > > > > this,
> > > > > > but for other extended resources, users may need that.
> > > > > >
> > > > > > Personally I think a pluggable "Operator Resource Assigner" is
> > > achievable
> > > > > > in this FLIP. But I am also OK with having that in a separate
> FLIP
> > > > > because
> > > > > > the interface between the "Operator Resource Assigner" and
> operator
> > > may
> > > > > > take a while to settle down if we want to make it generic. But I
> > > think
> > > > > our
> > > > > > implementation should take this future work into consideration so
> > > that we
> > > > > > don't need to break backwards compatibility once we have that.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Thank you for writing this FLIP.
> > > > > > >
> > > > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > > > scheduling
> > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > >
> > > > > > > One thought I had when reading the proposal is if it makes
> sense to
> > > > > look
> > > > > > at
> > > > > > > the "GPU Manager" as an "External Resource Manager", and GPU
> is one
> > > > > such
> > > > > > > resource.
> > > > > > > The way I understand the ResourceProfile and ResourceSpec,
> that is
> > > how
> > > > > it
> > > > > > > is done there.
> > > > > > > It has the advantage that it looks more extensible. Maybe
> there is
> > > a
> > > > > GPU
> > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> Resource, a
> > > > > Alibaba
> > > > > > > TPU Resource, etc.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> becket.qin@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the FLIP Yangze. GPU resource management support
> is a
> > > > > > > must-have
> > > > > > > > for machine learning use cases. Actually it is one of the
> mostly
> > > > > asked
> > > > > > > > question from the users who are interested in using Flink
> for ML.
> > > > > > > >
> > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > 1. The WebUI / REST API should probably also be mentioned in
> the
> > > > > public
> > > > > > > > interface section.
> > > > > > > > 2. Is the data structure that holds GPU info also a public
> API?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > tonysong820@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for drafting the FLIP and kicking off the
> discussion,
> > > > > Yangze.
> > > > > > > > >
> > > > > > > > > Big +1 for this feature. Supporting using of GPU in Flink
> is
> > > > > > > significant,
> > > > > > > > > especially for the ML scenarios.
> > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> > > think
> > > > > > it's a
> > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> karmagyz@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > We would like to start a discussion thread on "FLIP-108:
> Add
> > > GPU
> > > > > > > > > > support in Flink"[1].
> > > > > > > > > >
> > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > >
> > > > > > > > > > - Enable user to configure how many GPUs in a task
> executor
> > > and
> > > > > > > > > > forward such requirements to the external resource
> managers
> > > (for
> > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > - Provide information of available GPU resources to
> > > operators.
> > > > > > > > > >
> > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > >
> > > > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > > > - Introduce GPUManager as one of the task manager
> services to
> > > > > > > discover
> > > > > > > > > > and expose GPU resource information to the context of
> > > functions.
> > > > > > > > > > - Introduce the default script for GPU discovery, in
> which we
> > > > > > provide
> > > > > > > > > > the privilege mode to help user to achieve worker-level
> > > isolation
> > > > > > in
> > > > > > > > > > standalone mode.
> > > > > > > > > >
> > > > > > > > > > Please find more details in the FLIP wiki document [1].
> > > Looking
> > > > > > > forward
> > > > > > > > > to
> > > > > > > > > > your feedbacks.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Stephan Ewen <se...@apache.org>.
> > Can we somehow keep this out of the TaskManager services
> I fear that we could not. IMO, the GPUManager(or
> ExternalServicesManagers in future) is conceptually one of the task
> manager services, just like MemoryManager before 1.10.
> - It maintains/holds the GPU resource at TM level and all of the
> operators allocate the GPU resources from it. So, it should be
> exclusive to a single TaskExecutor.
> - We could add a collection called ExternalResourceManagers to hold
> all managers of other external resources in the future.
>

Can you help me understand why this needs the addition in TaskMagerServices
or in the RuntimeContext?
Are you worried about the case when multiple Task Executors run in the same
JVM? That's not common, but wouldn't it actually be good in that case to
share the GPU Manager, given that the GPU is shared?

Thanks,
Stephan

---------------------------


> What parts need information about this?
> In this FLIP, operators need the information. Thus, we expose GPU
> information to the RuntimeContext/FunctionContext. The slot profile is
> not aware of GPU resources as GPU is TM level resource now.
>
> > Can the GPU Manager be a "self contained" thing that simply takes the
> configuration, and then abstracts everything internally?
> Yes, we just pass the path/args of the discover script and how many
> GPUs per TM to it. It takes the responsibility to get the GPU
> information and expose them to the RuntimeContext/FunctionContext of
> Operators. Meanwhile, we'd better not allow operators to directly
> access GPUManager, it should get what they want from Context. We could
> then decouple the interface/implementation of GPUManager and Public
> API.
>
> Best,
> Yangze Guo
>
> On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <se...@apache.org> wrote:
> >
> > It sounds fine to initially start with GPU specific support and think
> about
> > generalizing this once we better understand the space.
> >
> > About the implementation suggested in FLIP-108:
> >   - Can we somehow keep this out of the TaskManager services? Anything we
> > have to pull through all layers of the TM makes the TM components yet
> more
> > complex and harder to maintain.
> >
> >   - What parts need information about this?
> >     -> do the slot profiles need information about the GPU?
> >     -> Can the GPU Manager be a "self contained" thing that simply takes
> > the configuration, and then abstracts everything internally? Operators
> can
> > access it via "GPUManager.get()" or so?
> >
> >
> >
> > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Thanks for all the feedbacks.
> > >
> > > @Becket
> > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > > Public API section.
> > >
> > >
> > > @Stephan @Becket
> > > Regarding the general extended resource mechanism, I second Xintong's
> > > suggestion.
> > > - It's better to leverage ResourceProfile and ResourceSpec after we
> > > supporting fine-grained GPU scheduling. As a first step proposal, I
> > > prefer to not include it in the scope of this FLIP.
> > > - Regarding the "Extended Resource Manager", if I understand
> > > correctly, it just a code refactoring atm, we could extract the
> > > open/close/allocateExtendResources of GPUManager to that interface. If
> > > that is the case, +1 to do it during implementation.
> > >
> > > @Xingbo
> > > As Xintong said, we looked into how Spark supports a general "Custom
> > > Resource Scheduling" before and decided to introduce a common resource
> > > configuration
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > to make it more extensible. I think the "resource" is a proper level
> > > to contain all the configs of extended resources.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <hx...@gmail.com>
> wrote:
> > > >
> > > > Thanks a lot for the FLIP, Yangze.
> > > >
> > > > There is no doubt that GPU resource management support will greatly
> > > > facilitate the development of AI-related applications by PyFlink
> users.
> > > >
> > > > I have only one comment about this wiki:
> > > >
> > > > Regarding the names of several GPU configurations, I think it is
> better
> > > to
> > > > delete the resource field makes it consistent with the names of other
> > > > resource-related configurations in TaskManagerOption.
> > > >
> > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > taskmanager.gpu.discovery-script.path
> > > >
> > > > Best,
> > > >
> > > > Xingbo
> > > >
> > > >
> > > > Xintong Song <to...@gmail.com> 于2020年3月4日周三 上午10:39写道:
> > > >
> > > > > @Stephan, @Becket,
> > > > >
> > > > > Actually, Yangze, Yang and I also had an offline discussion about
> > > making
> > > > > the "GPU Support" as some general "Extended Resource Support". We
> > > believe
> > > > > supporting extended resources in a general mechanism is definitely
> a
> > > good
> > > > > and extensible way. The reason we propose this FLIP narrowing its
> scope
> > > > > down to GPU alone, is mainly for the concern on extra efforts and
> > > review
> > > > > capacity needed for a general mechanism.
> > > > >
> > > > > To come up with a well design on a general extended resource
> management
> > > > > mechanism, we would need to investigate more on how people use
> > > different
> > > > > kind of resources in practice. For GPU, we learnt such knowledge
> from
> > > the
> > > > > experts, Becket and his team members. But for FPGA, or other
> potential
> > > > > extended resources, we don't have such convenient information
> sources,
> > > > > making the investigation requires more efforts, which I tend to
> think
> > > is
> > > > > not necessary atm.
> > > > >
> > > > > On the other hand, we also looked into how Spark supports a general
> > > "Custom
> > > > > Resource Scheduling". Assuming we want to have a similar general
> > > extended
> > > > > resource mechanism in the future, we believe that the current GPU
> > > support
> > > > > design can be easily extended, in an incremental way without too
> many
> > > > > reworks.
> > > > >
> > > > >    - The most important part is probably user interfaces. Spark
> offers
> > > > >    configuration options to define the amount, discovery script and
> > > vendor
> > > > > (on
> > > > >    k8s) in a per resource type bias [1], which is very similar to
> what
> > > we
> > > > >    proposed in this FLIP. I think it's not necessary to expose
> config
> > > > > options
> > > > >    in the general way atm, since we do not have supports for other
> > > resource
> > > > >    types now. If later we decided to have per resource type config
> > > > > options, we
> > > > >    can have backwards compatibility on the current proposed options
> > > with
> > > > >    simple key mapping.
> > > > >    - For the GPU Manager, if later needed we can change it to a
> > > "Extended
> > > > >    Resource Manager" (or whatever it is called). That should be a
> pure
> > > > >    component-internal refactoring.
> > > > >    - For ResourceProfile and ResourceSpec, there are already
> fields for
> > > > >    general extended resource. We can of course leverage them when
> > > > > supporting
> > > > >    fine grained GPU scheduling. That is also not in the scope of
> this
> > > first
> > > > >    step proposal, and would require FLIP-56 to be finished first.
> > > > >
> > > > > To summary up, I agree with Becket that have a separate FLIP for
> the
> > > > > general extended resource mechanism, and keep it in mind when
> > > discussing
> > > > > and implementing the current one.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > >
> > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > > > That's a good point, Stephan. It makes total sense to generalize
> the
> > > > > > resource management to support custom resources. Having that
> allows
> > > users
> > > > > > to add new resources by themselves. The general resource
> management
> > > may
> > > > > > involve two different aspects:
> > > > > >
> > > > > > 1. The custom resource type definition. It is supported by the
> > > extended
> > > > > > resources in ResourceProfile and ResourceSpec. This will likely
> cover
> > > > > > majority of the cases.
> > > > > >
> > > > > > 2. The custom resource allocation logic, i.e. how to assign the
> > > resources
> > > > > > to different tasks, operators, and so on. This may require two
> > > levels /
> > > > > > steps:
> > > > > >     a. Subtask level - make sure the subtasks are put into
> suitable
> > > > > slots.
> > > > > > It is done by the global RM and is not customizable right now.
> > > > > >     b. Operator level - map the exact resource to the operators
> in
> > > TM.
> > > > > e.g.
> > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> > > assuming
> > > > > > the global RM does not distinguish individual resources of the
> same
> > > type.
> > > > > > It is true for memory, but not for GPU.
> > > > > >
> > > > > > The GPU manager is designed to do 2.b here. So it should
> discover the
> > > > > > physical GPU information and bind/match them to each operators.
> > > Making
> > > > > this
> > > > > > general will fill in the missing piece to support custom resource
> > > type
> > > > > > definition. But I'd avoid calling it a "External Resource
> Manager" to
> > > > > avoid
> > > > > > confusion with RM, maybe something like "Operator Resource
> Assigner"
> > > > > would
> > > > > > be more accurate. So for each resource type users can have an
> > > optional
> > > > > > "Operator Resource Assigner" in the TM. For memory, users don't
> need
> > > > > this,
> > > > > > but for other extended resources, users may need that.
> > > > > >
> > > > > > Personally I think a pluggable "Operator Resource Assigner" is
> > > achievable
> > > > > > in this FLIP. But I am also OK with having that in a separate
> FLIP
> > > > > because
> > > > > > the interface between the "Operator Resource Assigner" and
> operator
> > > may
> > > > > > take a while to settle down if we want to make it generic. But I
> > > think
> > > > > our
> > > > > > implementation should take this future work into consideration so
> > > that we
> > > > > > don't need to break backwards compatibility once we have that.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Thank you for writing this FLIP.
> > > > > > >
> > > > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > > > scheduling
> > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > >
> > > > > > > One thought I had when reading the proposal is if it makes
> sense to
> > > > > look
> > > > > > at
> > > > > > > the "GPU Manager" as an "External Resource Manager", and GPU
> is one
> > > > > such
> > > > > > > resource.
> > > > > > > The way I understand the ResourceProfile and ResourceSpec,
> that is
> > > how
> > > > > it
> > > > > > > is done there.
> > > > > > > It has the advantage that it looks more extensible. Maybe
> there is
> > > a
> > > > > GPU
> > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> Resource, a
> > > > > Alibaba
> > > > > > > TPU Resource, etc.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> becket.qin@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the FLIP Yangze. GPU resource management support
> is a
> > > > > > > must-have
> > > > > > > > for machine learning use cases. Actually it is one of the
> mostly
> > > > > asked
> > > > > > > > question from the users who are interested in using Flink
> for ML.
> > > > > > > >
> > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > 1. The WebUI / REST API should probably also be mentioned in
> the
> > > > > public
> > > > > > > > interface section.
> > > > > > > > 2. Is the data structure that holds GPU info also a public
> API?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > tonysong820@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for drafting the FLIP and kicking off the
> discussion,
> > > > > Yangze.
> > > > > > > > >
> > > > > > > > > Big +1 for this feature. Supporting using of GPU in Flink
> is
> > > > > > > significant,
> > > > > > > > > especially for the ML scenarios.
> > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> > > think
> > > > > > it's a
> > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> karmagyz@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > We would like to start a discussion thread on "FLIP-108:
> Add
> > > GPU
> > > > > > > > > > support in Flink"[1].
> > > > > > > > > >
> > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > >
> > > > > > > > > > - Enable user to configure how many GPUs in a task
> executor
> > > and
> > > > > > > > > > forward such requirements to the external resource
> managers
> > > (for
> > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > - Provide information of available GPU resources to
> > > operators.
> > > > > > > > > >
> > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > >
> > > > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > > > - Introduce GPUManager as one of the task manager
> services to
> > > > > > > discover
> > > > > > > > > > and expose GPU resource information to the context of
> > > functions.
> > > > > > > > > > - Introduce the default script for GPU discovery, in
> which we
> > > > > > provide
> > > > > > > > > > the privilege mode to help user to achieve worker-level
> > > isolation
> > > > > > in
> > > > > > > > > > standalone mode.
> > > > > > > > > >
> > > > > > > > > > Please find more details in the FLIP wiki document [1].
> > > Looking
> > > > > > > forward
> > > > > > > > > to
> > > > > > > > > > your feedbacks.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Yangze Guo <ka...@gmail.com>.
Thanks for the feedback, Stephan.

> Can we somehow keep this out of the TaskManager services
I fear that we could not. IMO, the GPUManager(or
ExternalServicesManagers in future) is conceptually one of the task
manager services, just like MemoryManager before 1.10.
- It maintains/holds the GPU resource at TM level and all of the
operators allocate the GPU resources from it. So, it should be
exclusive to a single TaskExecutor.
- We could add a collection called ExternalResourceManagers to hold
all managers of other external resources in the future.

> What parts need information about this?
In this FLIP, operators need the information. Thus, we expose GPU
information to the RuntimeContext/FunctionContext. The slot profile is
not aware of GPU resources as GPU is TM level resource now.

> Can the GPU Manager be a "self contained" thing that simply takes the configuration, and then abstracts everything internally?
Yes, we just pass the path/args of the discover script and how many
GPUs per TM to it. It takes the responsibility to get the GPU
information and expose them to the RuntimeContext/FunctionContext of
Operators. Meanwhile, we'd better not allow operators to directly
access GPUManager, it should get what they want from Context. We could
then decouple the interface/implementation of GPUManager and Public
API.

Best,
Yangze Guo

On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <se...@apache.org> wrote:
>
> It sounds fine to initially start with GPU specific support and think about
> generalizing this once we better understand the space.
>
> About the implementation suggested in FLIP-108:
>   - Can we somehow keep this out of the TaskManager services? Anything we
> have to pull through all layers of the TM makes the TM components yet more
> complex and harder to maintain.
>
>   - What parts need information about this?
>     -> do the slot profiles need information about the GPU?
>     -> Can the GPU Manager be a "self contained" thing that simply takes
> the configuration, and then abstracts everything internally? Operators can
> access it via "GPUManager.get()" or so?
>
>
>
> On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <ka...@gmail.com> wrote:
>
> > Thanks for all the feedbacks.
> >
> > @Becket
> > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > Public API section.
> >
> >
> > @Stephan @Becket
> > Regarding the general extended resource mechanism, I second Xintong's
> > suggestion.
> > - It's better to leverage ResourceProfile and ResourceSpec after we
> > supporting fine-grained GPU scheduling. As a first step proposal, I
> > prefer to not include it in the scope of this FLIP.
> > - Regarding the "Extended Resource Manager", if I understand
> > correctly, it just a code refactoring atm, we could extract the
> > open/close/allocateExtendResources of GPUManager to that interface. If
> > that is the case, +1 to do it during implementation.
> >
> > @Xingbo
> > As Xintong said, we looked into how Spark supports a general "Custom
> > Resource Scheduling" before and decided to introduce a common resource
> > configuration
> > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > to make it more extensible. I think the "resource" is a proper level
> > to contain all the configs of extended resources.
> >
> > Best,
> > Yangze Guo
> >
> > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <hx...@gmail.com> wrote:
> > >
> > > Thanks a lot for the FLIP, Yangze.
> > >
> > > There is no doubt that GPU resource management support will greatly
> > > facilitate the development of AI-related applications by PyFlink users.
> > >
> > > I have only one comment about this wiki:
> > >
> > > Regarding the names of several GPU configurations, I think it is better
> > to
> > > delete the resource field makes it consistent with the names of other
> > > resource-related configurations in TaskManagerOption.
> > >
> > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > taskmanager.gpu.discovery-script.path
> > >
> > > Best,
> > >
> > > Xingbo
> > >
> > >
> > > Xintong Song <to...@gmail.com> 于2020年3月4日周三 上午10:39写道:
> > >
> > > > @Stephan, @Becket,
> > > >
> > > > Actually, Yangze, Yang and I also had an offline discussion about
> > making
> > > > the "GPU Support" as some general "Extended Resource Support". We
> > believe
> > > > supporting extended resources in a general mechanism is definitely a
> > good
> > > > and extensible way. The reason we propose this FLIP narrowing its scope
> > > > down to GPU alone, is mainly for the concern on extra efforts and
> > review
> > > > capacity needed for a general mechanism.
> > > >
> > > > To come up with a well design on a general extended resource management
> > > > mechanism, we would need to investigate more on how people use
> > different
> > > > kind of resources in practice. For GPU, we learnt such knowledge from
> > the
> > > > experts, Becket and his team members. But for FPGA, or other potential
> > > > extended resources, we don't have such convenient information sources,
> > > > making the investigation requires more efforts, which I tend to think
> > is
> > > > not necessary atm.
> > > >
> > > > On the other hand, we also looked into how Spark supports a general
> > "Custom
> > > > Resource Scheduling". Assuming we want to have a similar general
> > extended
> > > > resource mechanism in the future, we believe that the current GPU
> > support
> > > > design can be easily extended, in an incremental way without too many
> > > > reworks.
> > > >
> > > >    - The most important part is probably user interfaces. Spark offers
> > > >    configuration options to define the amount, discovery script and
> > vendor
> > > > (on
> > > >    k8s) in a per resource type bias [1], which is very similar to what
> > we
> > > >    proposed in this FLIP. I think it's not necessary to expose config
> > > > options
> > > >    in the general way atm, since we do not have supports for other
> > resource
> > > >    types now. If later we decided to have per resource type config
> > > > options, we
> > > >    can have backwards compatibility on the current proposed options
> > with
> > > >    simple key mapping.
> > > >    - For the GPU Manager, if later needed we can change it to a
> > "Extended
> > > >    Resource Manager" (or whatever it is called). That should be a pure
> > > >    component-internal refactoring.
> > > >    - For ResourceProfile and ResourceSpec, there are already fields for
> > > >    general extended resource. We can of course leverage them when
> > > > supporting
> > > >    fine grained GPU scheduling. That is also not in the scope of this
> > first
> > > >    step proposal, and would require FLIP-56 to be finished first.
> > > >
> > > > To summary up, I agree with Becket that have a separate FLIP for the
> > > > general extended resource mechanism, and keep it in mind when
> > discussing
> > > > and implementing the current one.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > >
> > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > > > That's a good point, Stephan. It makes total sense to generalize the
> > > > > resource management to support custom resources. Having that allows
> > users
> > > > > to add new resources by themselves. The general resource management
> > may
> > > > > involve two different aspects:
> > > > >
> > > > > 1. The custom resource type definition. It is supported by the
> > extended
> > > > > resources in ResourceProfile and ResourceSpec. This will likely cover
> > > > > majority of the cases.
> > > > >
> > > > > 2. The custom resource allocation logic, i.e. how to assign the
> > resources
> > > > > to different tasks, operators, and so on. This may require two
> > levels /
> > > > > steps:
> > > > >     a. Subtask level - make sure the subtasks are put into suitable
> > > > slots.
> > > > > It is done by the global RM and is not customizable right now.
> > > > >     b. Operator level - map the exact resource to the operators in
> > TM.
> > > > e.g.
> > > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> > assuming
> > > > > the global RM does not distinguish individual resources of the same
> > type.
> > > > > It is true for memory, but not for GPU.
> > > > >
> > > > > The GPU manager is designed to do 2.b here. So it should discover the
> > > > > physical GPU information and bind/match them to each operators.
> > Making
> > > > this
> > > > > general will fill in the missing piece to support custom resource
> > type
> > > > > definition. But I'd avoid calling it a "External Resource Manager" to
> > > > avoid
> > > > > confusion with RM, maybe something like "Operator Resource Assigner"
> > > > would
> > > > > be more accurate. So for each resource type users can have an
> > optional
> > > > > "Operator Resource Assigner" in the TM. For memory, users don't need
> > > > this,
> > > > > but for other extended resources, users may need that.
> > > > >
> > > > > Personally I think a pluggable "Operator Resource Assigner" is
> > achievable
> > > > > in this FLIP. But I am also OK with having that in a separate FLIP
> > > > because
> > > > > the interface between the "Operator Resource Assigner" and operator
> > may
> > > > > take a while to settle down if we want to make it generic. But I
> > think
> > > > our
> > > > > implementation should take this future work into consideration so
> > that we
> > > > > don't need to break backwards compatibility once we have that.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <se...@apache.org>
> > wrote:
> > > > >
> > > > > > Thank you for writing this FLIP.
> > > > > >
> > > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > > scheduling
> > > > > > and GPU allocation, as I have no experience with that.
> > > > > >
> > > > > > One thought I had when reading the proposal is if it makes sense to
> > > > look
> > > > > at
> > > > > > the "GPU Manager" as an "External Resource Manager", and GPU is one
> > > > such
> > > > > > resource.
> > > > > > The way I understand the ResourceProfile and ResourceSpec, that is
> > how
> > > > it
> > > > > > is done there.
> > > > > > It has the advantage that it looks more extensible. Maybe there is
> > a
> > > > GPU
> > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a
> > > > Alibaba
> > > > > > TPU Resource, etc.
> > > > > >
> > > > > > Best,
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <be...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Thanks for the FLIP Yangze. GPU resource management support is a
> > > > > > must-have
> > > > > > > for machine learning use cases. Actually it is one of the mostly
> > > > asked
> > > > > > > question from the users who are interested in using Flink for ML.
> > > > > > >
> > > > > > > Some quick comments / questions to the wiki.
> > > > > > > 1. The WebUI / REST API should probably also be mentioned in the
> > > > public
> > > > > > > interface section.
> > > > > > > 2. Is the data structure that holds GPU info also a public API?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > tonysong820@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for drafting the FLIP and kicking off the discussion,
> > > > Yangze.
> > > > > > > >
> > > > > > > > Big +1 for this feature. Supporting using of GPU in Flink is
> > > > > > significant,
> > > > > > > > especially for the ML scenarios.
> > > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> > think
> > > > > it's a
> > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <karmagyz@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > We would like to start a discussion thread on "FLIP-108: Add
> > GPU
> > > > > > > > > support in Flink"[1].
> > > > > > > > >
> > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > >
> > > > > > > > > - Enable user to configure how many GPUs in a task executor
> > and
> > > > > > > > > forward such requirements to the external resource managers
> > (for
> > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > - Provide information of available GPU resources to
> > operators.
> > > > > > > > >
> > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > >
> > > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > > - Introduce GPUManager as one of the task manager services to
> > > > > > discover
> > > > > > > > > and expose GPU resource information to the context of
> > functions.
> > > > > > > > > - Introduce the default script for GPU discovery, in which we
> > > > > provide
> > > > > > > > > the privilege mode to help user to achieve worker-level
> > isolation
> > > > > in
> > > > > > > > > standalone mode.
> > > > > > > > >
> > > > > > > > > Please find more details in the FLIP wiki document [1].
> > Looking
> > > > > > forward
> > > > > > > > to
> > > > > > > > > your feedbacks.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yangze Guo
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Stephan Ewen <se...@apache.org>.
It sounds fine to initially start with GPU specific support and think about
generalizing this once we better understand the space.

About the implementation suggested in FLIP-108:
  - Can we somehow keep this out of the TaskManager services? Anything we
have to pull through all layers of the TM makes the TM components yet more
complex and harder to maintain.

  - What parts need information about this?
    -> do the slot profiles need information about the GPU?
    -> Can the GPU Manager be a "self contained" thing that simply takes
the configuration, and then abstracts everything internally? Operators can
access it via "GPUManager.get()" or so?



On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <ka...@gmail.com> wrote:

> Thanks for all the feedbacks.
>
> @Becket
> Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> Public API section.
>
>
> @Stephan @Becket
> Regarding the general extended resource mechanism, I second Xintong's
> suggestion.
> - It's better to leverage ResourceProfile and ResourceSpec after we
> supporting fine-grained GPU scheduling. As a first step proposal, I
> prefer to not include it in the scope of this FLIP.
> - Regarding the "Extended Resource Manager", if I understand
> correctly, it just a code refactoring atm, we could extract the
> open/close/allocateExtendResources of GPUManager to that interface. If
> that is the case, +1 to do it during implementation.
>
> @Xingbo
> As Xintong said, we looked into how Spark supports a general "Custom
> Resource Scheduling" before and decided to introduce a common resource
> configuration
> schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> to make it more extensible. I think the "resource" is a proper level
> to contain all the configs of extended resources.
>
> Best,
> Yangze Guo
>
> On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <hx...@gmail.com> wrote:
> >
> > Thanks a lot for the FLIP, Yangze.
> >
> > There is no doubt that GPU resource management support will greatly
> > facilitate the development of AI-related applications by PyFlink users.
> >
> > I have only one comment about this wiki:
> >
> > Regarding the names of several GPU configurations, I think it is better
> to
> > delete the resource field makes it consistent with the names of other
> > resource-related configurations in TaskManagerOption.
> >
> > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > taskmanager.gpu.discovery-script.path
> >
> > Best,
> >
> > Xingbo
> >
> >
> > Xintong Song <to...@gmail.com> 于2020年3月4日周三 上午10:39写道:
> >
> > > @Stephan, @Becket,
> > >
> > > Actually, Yangze, Yang and I also had an offline discussion about
> making
> > > the "GPU Support" as some general "Extended Resource Support". We
> believe
> > > supporting extended resources in a general mechanism is definitely a
> good
> > > and extensible way. The reason we propose this FLIP narrowing its scope
> > > down to GPU alone, is mainly for the concern on extra efforts and
> review
> > > capacity needed for a general mechanism.
> > >
> > > To come up with a well design on a general extended resource management
> > > mechanism, we would need to investigate more on how people use
> different
> > > kind of resources in practice. For GPU, we learnt such knowledge from
> the
> > > experts, Becket and his team members. But for FPGA, or other potential
> > > extended resources, we don't have such convenient information sources,
> > > making the investigation requires more efforts, which I tend to think
> is
> > > not necessary atm.
> > >
> > > On the other hand, we also looked into how Spark supports a general
> "Custom
> > > Resource Scheduling". Assuming we want to have a similar general
> extended
> > > resource mechanism in the future, we believe that the current GPU
> support
> > > design can be easily extended, in an incremental way without too many
> > > reworks.
> > >
> > >    - The most important part is probably user interfaces. Spark offers
> > >    configuration options to define the amount, discovery script and
> vendor
> > > (on
> > >    k8s) in a per resource type bias [1], which is very similar to what
> we
> > >    proposed in this FLIP. I think it's not necessary to expose config
> > > options
> > >    in the general way atm, since we do not have supports for other
> resource
> > >    types now. If later we decided to have per resource type config
> > > options, we
> > >    can have backwards compatibility on the current proposed options
> with
> > >    simple key mapping.
> > >    - For the GPU Manager, if later needed we can change it to a
> "Extended
> > >    Resource Manager" (or whatever it is called). That should be a pure
> > >    component-internal refactoring.
> > >    - For ResourceProfile and ResourceSpec, there are already fields for
> > >    general extended resource. We can of course leverage them when
> > > supporting
> > >    fine grained GPU scheduling. That is also not in the scope of this
> first
> > >    step proposal, and would require FLIP-56 to be finished first.
> > >
> > > To summary up, I agree with Becket that have a separate FLIP for the
> > > general extended resource mechanism, and keep it in mind when
> discussing
> > > and implementing the current one.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1]
> > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > >
> > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > > > That's a good point, Stephan. It makes total sense to generalize the
> > > > resource management to support custom resources. Having that allows
> users
> > > > to add new resources by themselves. The general resource management
> may
> > > > involve two different aspects:
> > > >
> > > > 1. The custom resource type definition. It is supported by the
> extended
> > > > resources in ResourceProfile and ResourceSpec. This will likely cover
> > > > majority of the cases.
> > > >
> > > > 2. The custom resource allocation logic, i.e. how to assign the
> resources
> > > > to different tasks, operators, and so on. This may require two
> levels /
> > > > steps:
> > > >     a. Subtask level - make sure the subtasks are put into suitable
> > > slots.
> > > > It is done by the global RM and is not customizable right now.
> > > >     b. Operator level - map the exact resource to the operators in
> TM.
> > > e.g.
> > > > GPU 1 for operator A, GPU 2 for operator B. This step is needed
> assuming
> > > > the global RM does not distinguish individual resources of the same
> type.
> > > > It is true for memory, but not for GPU.
> > > >
> > > > The GPU manager is designed to do 2.b here. So it should discover the
> > > > physical GPU information and bind/match them to each operators.
> Making
> > > this
> > > > general will fill in the missing piece to support custom resource
> type
> > > > definition. But I'd avoid calling it a "External Resource Manager" to
> > > avoid
> > > > confusion with RM, maybe something like "Operator Resource Assigner"
> > > would
> > > > be more accurate. So for each resource type users can have an
> optional
> > > > "Operator Resource Assigner" in the TM. For memory, users don't need
> > > this,
> > > > but for other extended resources, users may need that.
> > > >
> > > > Personally I think a pluggable "Operator Resource Assigner" is
> achievable
> > > > in this FLIP. But I am also OK with having that in a separate FLIP
> > > because
> > > > the interface between the "Operator Resource Assigner" and operator
> may
> > > > take a while to settle down if we want to make it generic. But I
> think
> > > our
> > > > implementation should take this future work into consideration so
> that we
> > > > don't need to break backwards compatibility once we have that.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <se...@apache.org>
> wrote:
> > > >
> > > > > Thank you for writing this FLIP.
> > > > >
> > > > > I cannot really give much input into the mechanics of GPU-aware
> > > > scheduling
> > > > > and GPU allocation, as I have no experience with that.
> > > > >
> > > > > One thought I had when reading the proposal is if it makes sense to
> > > look
> > > > at
> > > > > the "GPU Manager" as an "External Resource Manager", and GPU is one
> > > such
> > > > > resource.
> > > > > The way I understand the ResourceProfile and ResourceSpec, that is
> how
> > > it
> > > > > is done there.
> > > > > It has the advantage that it looks more extensible. Maybe there is
> a
> > > GPU
> > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a
> > > Alibaba
> > > > > TPU Resource, etc.
> > > > >
> > > > > Best,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thanks for the FLIP Yangze. GPU resource management support is a
> > > > > must-have
> > > > > > for machine learning use cases. Actually it is one of the mostly
> > > asked
> > > > > > question from the users who are interested in using Flink for ML.
> > > > > >
> > > > > > Some quick comments / questions to the wiki.
> > > > > > 1. The WebUI / REST API should probably also be mentioned in the
> > > public
> > > > > > interface section.
> > > > > > 2. Is the data structure that holds GPU info also a public API?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> tonysong820@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for drafting the FLIP and kicking off the discussion,
> > > Yangze.
> > > > > > >
> > > > > > > Big +1 for this feature. Supporting using of GPU in Flink is
> > > > > significant,
> > > > > > > especially for the ML scenarios.
> > > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I
> think
> > > > it's a
> > > > > > > very good first step for Flink's GPU supports.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <karmagyz@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > We would like to start a discussion thread on "FLIP-108: Add
> GPU
> > > > > > > > support in Flink"[1].
> > > > > > > >
> > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > >
> > > > > > > > - Enable user to configure how many GPUs in a task executor
> and
> > > > > > > > forward such requirements to the external resource managers
> (for
> > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > - Provide information of available GPU resources to
> operators.
> > > > > > > >
> > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > >
> > > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > > - Introduce GPUManager as one of the task manager services to
> > > > > discover
> > > > > > > > and expose GPU resource information to the context of
> functions.
> > > > > > > > - Introduce the default script for GPU discovery, in which we
> > > > provide
> > > > > > > > the privilege mode to help user to achieve worker-level
> isolation
> > > > in
> > > > > > > > standalone mode.
> > > > > > > >
> > > > > > > > Please find more details in the FLIP wiki document [1].
> Looking
> > > > > forward
> > > > > > > to
> > > > > > > > your feedbacks.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yangze Guo
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Yangze Guo <ka...@gmail.com>.
Thanks for all the feedbacks.

@Becket
Regarding the WebUI and GPUInfo, you're right, I'll add them to the
Public API section.


@Stephan @Becket
Regarding the general extended resource mechanism, I second Xintong's
suggestion.
- It's better to leverage ResourceProfile and ResourceSpec after we
supporting fine-grained GPU scheduling. As a first step proposal, I
prefer to not include it in the scope of this FLIP.
- Regarding the "Extended Resource Manager", if I understand
correctly, it just a code refactoring atm, we could extract the
open/close/allocateExtendResources of GPUManager to that interface. If
that is the case, +1 to do it during implementation.

@Xingbo
As Xintong said, we looked into how Spark supports a general "Custom
Resource Scheduling" before and decided to introduce a common resource
configuration schema(taskmanager.resource.{resourceName}.amount/discovery-script)
to make it more extensible. I think the "resource" is a proper level
to contain all the configs of extended resources.

Best,
Yangze Guo

On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <hx...@gmail.com> wrote:
>
> Thanks a lot for the FLIP, Yangze.
>
> There is no doubt that GPU resource management support will greatly
> facilitate the development of AI-related applications by PyFlink users.
>
> I have only one comment about this wiki:
>
> Regarding the names of several GPU configurations, I think it is better to
> delete the resource field makes it consistent with the names of other
> resource-related configurations in TaskManagerOption.
>
> e.g. taskmanager.resource.gpu.discovery-script.path ->
> taskmanager.gpu.discovery-script.path
>
> Best,
>
> Xingbo
>
>
> Xintong Song <to...@gmail.com> 于2020年3月4日周三 上午10:39写道:
>
> > @Stephan, @Becket,
> >
> > Actually, Yangze, Yang and I also had an offline discussion about making
> > the "GPU Support" as some general "Extended Resource Support". We believe
> > supporting extended resources in a general mechanism is definitely a good
> > and extensible way. The reason we propose this FLIP narrowing its scope
> > down to GPU alone, is mainly for the concern on extra efforts and review
> > capacity needed for a general mechanism.
> >
> > To come up with a well design on a general extended resource management
> > mechanism, we would need to investigate more on how people use different
> > kind of resources in practice. For GPU, we learnt such knowledge from the
> > experts, Becket and his team members. But for FPGA, or other potential
> > extended resources, we don't have such convenient information sources,
> > making the investigation requires more efforts, which I tend to think is
> > not necessary atm.
> >
> > On the other hand, we also looked into how Spark supports a general "Custom
> > Resource Scheduling". Assuming we want to have a similar general extended
> > resource mechanism in the future, we believe that the current GPU support
> > design can be easily extended, in an incremental way without too many
> > reworks.
> >
> >    - The most important part is probably user interfaces. Spark offers
> >    configuration options to define the amount, discovery script and vendor
> > (on
> >    k8s) in a per resource type bias [1], which is very similar to what we
> >    proposed in this FLIP. I think it's not necessary to expose config
> > options
> >    in the general way atm, since we do not have supports for other resource
> >    types now. If later we decided to have per resource type config
> > options, we
> >    can have backwards compatibility on the current proposed options with
> >    simple key mapping.
> >    - For the GPU Manager, if later needed we can change it to a "Extended
> >    Resource Manager" (or whatever it is called). That should be a pure
> >    component-internal refactoring.
> >    - For ResourceProfile and ResourceSpec, there are already fields for
> >    general extended resource. We can of course leverage them when
> > supporting
> >    fine grained GPU scheduling. That is also not in the scope of this first
> >    step proposal, and would require FLIP-56 to be finished first.
> >
> > To summary up, I agree with Becket that have a separate FLIP for the
> > general extended resource mechanism, and keep it in mind when discussing
> > and implementing the current one.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> >
> > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <be...@gmail.com> wrote:
> >
> > > That's a good point, Stephan. It makes total sense to generalize the
> > > resource management to support custom resources. Having that allows users
> > > to add new resources by themselves. The general resource management may
> > > involve two different aspects:
> > >
> > > 1. The custom resource type definition. It is supported by the extended
> > > resources in ResourceProfile and ResourceSpec. This will likely cover
> > > majority of the cases.
> > >
> > > 2. The custom resource allocation logic, i.e. how to assign the resources
> > > to different tasks, operators, and so on. This may require two levels /
> > > steps:
> > >     a. Subtask level - make sure the subtasks are put into suitable
> > slots.
> > > It is done by the global RM and is not customizable right now.
> > >     b. Operator level - map the exact resource to the operators in TM.
> > e.g.
> > > GPU 1 for operator A, GPU 2 for operator B. This step is needed assuming
> > > the global RM does not distinguish individual resources of the same type.
> > > It is true for memory, but not for GPU.
> > >
> > > The GPU manager is designed to do 2.b here. So it should discover the
> > > physical GPU information and bind/match them to each operators. Making
> > this
> > > general will fill in the missing piece to support custom resource type
> > > definition. But I'd avoid calling it a "External Resource Manager" to
> > avoid
> > > confusion with RM, maybe something like "Operator Resource Assigner"
> > would
> > > be more accurate. So for each resource type users can have an optional
> > > "Operator Resource Assigner" in the TM. For memory, users don't need
> > this,
> > > but for other extended resources, users may need that.
> > >
> > > Personally I think a pluggable "Operator Resource Assigner" is achievable
> > > in this FLIP. But I am also OK with having that in a separate FLIP
> > because
> > > the interface between the "Operator Resource Assigner" and operator may
> > > take a while to settle down if we want to make it generic. But I think
> > our
> > > implementation should take this future work into consideration so that we
> > > don't need to break backwards compatibility once we have that.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <se...@apache.org> wrote:
> > >
> > > > Thank you for writing this FLIP.
> > > >
> > > > I cannot really give much input into the mechanics of GPU-aware
> > > scheduling
> > > > and GPU allocation, as I have no experience with that.
> > > >
> > > > One thought I had when reading the proposal is if it makes sense to
> > look
> > > at
> > > > the "GPU Manager" as an "External Resource Manager", and GPU is one
> > such
> > > > resource.
> > > > The way I understand the ResourceProfile and ResourceSpec, that is how
> > it
> > > > is done there.
> > > > It has the advantage that it looks more extensible. Maybe there is a
> > GPU
> > > > Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a
> > Alibaba
> > > > TPU Resource, etc.
> > > >
> > > > Best,
> > > > Stephan
> > > >
> > > >
> > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks for the FLIP Yangze. GPU resource management support is a
> > > > must-have
> > > > > for machine learning use cases. Actually it is one of the mostly
> > asked
> > > > > question from the users who are interested in using Flink for ML.
> > > > >
> > > > > Some quick comments / questions to the wiki.
> > > > > 1. The WebUI / REST API should probably also be mentioned in the
> > public
> > > > > interface section.
> > > > > 2. Is the data structure that holds GPU info also a public API?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <to...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for drafting the FLIP and kicking off the discussion,
> > Yangze.
> > > > > >
> > > > > > Big +1 for this feature. Supporting using of GPU in Flink is
> > > > significant,
> > > > > > especially for the ML scenarios.
> > > > > > I've reviewed the FLIP wiki doc and it looks good to me. I think
> > > it's a
> > > > > > very good first step for Flink's GPU supports.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <ka...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > We would like to start a discussion thread on "FLIP-108: Add GPU
> > > > > > > support in Flink"[1].
> > > > > > >
> > > > > > > This FLIP mainly discusses the following issues:
> > > > > > >
> > > > > > > - Enable user to configure how many GPUs in a task executor and
> > > > > > > forward such requirements to the external resource managers (for
> > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > - Provide information of available GPU resources to operators.
> > > > > > >
> > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > >
> > > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > > - Introduce GPUManager as one of the task manager services to
> > > > discover
> > > > > > > and expose GPU resource information to the context of functions.
> > > > > > > - Introduce the default script for GPU discovery, in which we
> > > provide
> > > > > > > the privilege mode to help user to achieve worker-level isolation
> > > in
> > > > > > > standalone mode.
> > > > > > >
> > > > > > > Please find more details in the FLIP wiki document [1]. Looking
> > > > forward
> > > > > > to
> > > > > > > your feedbacks.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Xingbo Huang <hx...@gmail.com>.
Thanks a lot for the FLIP, Yangze.

There is no doubt that GPU resource management support will greatly
facilitate the development of AI-related applications by PyFlink users.

I have only one comment about this wiki:

Regarding the names of several GPU configurations, I think it is better to
delete the resource field makes it consistent with the names of other
resource-related configurations in TaskManagerOption.

e.g. taskmanager.resource.gpu.discovery-script.path ->
taskmanager.gpu.discovery-script.path

Best,

Xingbo


Xintong Song <to...@gmail.com> 于2020年3月4日周三 上午10:39写道:

> @Stephan, @Becket,
>
> Actually, Yangze, Yang and I also had an offline discussion about making
> the "GPU Support" as some general "Extended Resource Support". We believe
> supporting extended resources in a general mechanism is definitely a good
> and extensible way. The reason we propose this FLIP narrowing its scope
> down to GPU alone, is mainly for the concern on extra efforts and review
> capacity needed for a general mechanism.
>
> To come up with a well design on a general extended resource management
> mechanism, we would need to investigate more on how people use different
> kind of resources in practice. For GPU, we learnt such knowledge from the
> experts, Becket and his team members. But for FPGA, or other potential
> extended resources, we don't have such convenient information sources,
> making the investigation requires more efforts, which I tend to think is
> not necessary atm.
>
> On the other hand, we also looked into how Spark supports a general "Custom
> Resource Scheduling". Assuming we want to have a similar general extended
> resource mechanism in the future, we believe that the current GPU support
> design can be easily extended, in an incremental way without too many
> reworks.
>
>    - The most important part is probably user interfaces. Spark offers
>    configuration options to define the amount, discovery script and vendor
> (on
>    k8s) in a per resource type bias [1], which is very similar to what we
>    proposed in this FLIP. I think it's not necessary to expose config
> options
>    in the general way atm, since we do not have supports for other resource
>    types now. If later we decided to have per resource type config
> options, we
>    can have backwards compatibility on the current proposed options with
>    simple key mapping.
>    - For the GPU Manager, if later needed we can change it to a "Extended
>    Resource Manager" (or whatever it is called). That should be a pure
>    component-internal refactoring.
>    - For ResourceProfile and ResourceSpec, there are already fields for
>    general extended resource. We can of course leverage them when
> supporting
>    fine grained GPU scheduling. That is also not in the scope of this first
>    step proposal, and would require FLIP-56 to be finished first.
>
> To summary up, I agree with Becket that have a separate FLIP for the
> general extended resource mechanism, and keep it in mind when discussing
> and implementing the current one.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
>
> On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <be...@gmail.com> wrote:
>
> > That's a good point, Stephan. It makes total sense to generalize the
> > resource management to support custom resources. Having that allows users
> > to add new resources by themselves. The general resource management may
> > involve two different aspects:
> >
> > 1. The custom resource type definition. It is supported by the extended
> > resources in ResourceProfile and ResourceSpec. This will likely cover
> > majority of the cases.
> >
> > 2. The custom resource allocation logic, i.e. how to assign the resources
> > to different tasks, operators, and so on. This may require two levels /
> > steps:
> >     a. Subtask level - make sure the subtasks are put into suitable
> slots.
> > It is done by the global RM and is not customizable right now.
> >     b. Operator level - map the exact resource to the operators in TM.
> e.g.
> > GPU 1 for operator A, GPU 2 for operator B. This step is needed assuming
> > the global RM does not distinguish individual resources of the same type.
> > It is true for memory, but not for GPU.
> >
> > The GPU manager is designed to do 2.b here. So it should discover the
> > physical GPU information and bind/match them to each operators. Making
> this
> > general will fill in the missing piece to support custom resource type
> > definition. But I'd avoid calling it a "External Resource Manager" to
> avoid
> > confusion with RM, maybe something like "Operator Resource Assigner"
> would
> > be more accurate. So for each resource type users can have an optional
> > "Operator Resource Assigner" in the TM. For memory, users don't need
> this,
> > but for other extended resources, users may need that.
> >
> > Personally I think a pluggable "Operator Resource Assigner" is achievable
> > in this FLIP. But I am also OK with having that in a separate FLIP
> because
> > the interface between the "Operator Resource Assigner" and operator may
> > take a while to settle down if we want to make it generic. But I think
> our
> > implementation should take this future work into consideration so that we
> > don't need to break backwards compatibility once we have that.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <se...@apache.org> wrote:
> >
> > > Thank you for writing this FLIP.
> > >
> > > I cannot really give much input into the mechanics of GPU-aware
> > scheduling
> > > and GPU allocation, as I have no experience with that.
> > >
> > > One thought I had when reading the proposal is if it makes sense to
> look
> > at
> > > the "GPU Manager" as an "External Resource Manager", and GPU is one
> such
> > > resource.
> > > The way I understand the ResourceProfile and ResourceSpec, that is how
> it
> > > is done there.
> > > It has the advantage that it looks more extensible. Maybe there is a
> GPU
> > > Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a
> Alibaba
> > > TPU Resource, etc.
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > > > Thanks for the FLIP Yangze. GPU resource management support is a
> > > must-have
> > > > for machine learning use cases. Actually it is one of the mostly
> asked
> > > > question from the users who are interested in using Flink for ML.
> > > >
> > > > Some quick comments / questions to the wiki.
> > > > 1. The WebUI / REST API should probably also be mentioned in the
> public
> > > > interface section.
> > > > 2. Is the data structure that holds GPU info also a public API?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <to...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks for drafting the FLIP and kicking off the discussion,
> Yangze.
> > > > >
> > > > > Big +1 for this feature. Supporting using of GPU in Flink is
> > > significant,
> > > > > especially for the ML scenarios.
> > > > > I've reviewed the FLIP wiki doc and it looks good to me. I think
> > it's a
> > > > > very good first step for Flink's GPU supports.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <ka...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > We would like to start a discussion thread on "FLIP-108: Add GPU
> > > > > > support in Flink"[1].
> > > > > >
> > > > > > This FLIP mainly discusses the following issues:
> > > > > >
> > > > > > - Enable user to configure how many GPUs in a task executor and
> > > > > > forward such requirements to the external resource managers (for
> > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > - Provide information of available GPU resources to operators.
> > > > > >
> > > > > > Key changes proposed in the FLIP are as follows:
> > > > > >
> > > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > > - Introduce GPUManager as one of the task manager services to
> > > discover
> > > > > > and expose GPU resource information to the context of functions.
> > > > > > - Introduce the default script for GPU discovery, in which we
> > provide
> > > > > > the privilege mode to help user to achieve worker-level isolation
> > in
> > > > > > standalone mode.
> > > > > >
> > > > > > Please find more details in the FLIP wiki document [1]. Looking
> > > forward
> > > > > to
> > > > > > your feedbacks.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Xintong Song <to...@gmail.com>.
@Stephan, @Becket,

Actually, Yangze, Yang and I also had an offline discussion about making
the "GPU Support" as some general "Extended Resource Support". We believe
supporting extended resources in a general mechanism is definitely a good
and extensible way. The reason we propose this FLIP narrowing its scope
down to GPU alone, is mainly for the concern on extra efforts and review
capacity needed for a general mechanism.

To come up with a well design on a general extended resource management
mechanism, we would need to investigate more on how people use different
kind of resources in practice. For GPU, we learnt such knowledge from the
experts, Becket and his team members. But for FPGA, or other potential
extended resources, we don't have such convenient information sources,
making the investigation requires more efforts, which I tend to think is
not necessary atm.

On the other hand, we also looked into how Spark supports a general "Custom
Resource Scheduling". Assuming we want to have a similar general extended
resource mechanism in the future, we believe that the current GPU support
design can be easily extended, in an incremental way without too many
reworks.

   - The most important part is probably user interfaces. Spark offers
   configuration options to define the amount, discovery script and vendor (on
   k8s) in a per resource type bias [1], which is very similar to what we
   proposed in this FLIP. I think it's not necessary to expose config options
   in the general way atm, since we do not have supports for other resource
   types now. If later we decided to have per resource type config options, we
   can have backwards compatibility on the current proposed options with
   simple key mapping.
   - For the GPU Manager, if later needed we can change it to a "Extended
   Resource Manager" (or whatever it is called). That should be a pure
   component-internal refactoring.
   - For ResourceProfile and ResourceSpec, there are already fields for
   general extended resource. We can of course leverage them when supporting
   fine grained GPU scheduling. That is also not in the scope of this first
   step proposal, and would require FLIP-56 to be finished first.

To summary up, I agree with Becket that have a separate FLIP for the
general extended resource mechanism, and keep it in mind when discussing
and implementing the current one.

Thank you~

Xintong Song


[1]
https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview

On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <be...@gmail.com> wrote:

> That's a good point, Stephan. It makes total sense to generalize the
> resource management to support custom resources. Having that allows users
> to add new resources by themselves. The general resource management may
> involve two different aspects:
>
> 1. The custom resource type definition. It is supported by the extended
> resources in ResourceProfile and ResourceSpec. This will likely cover
> majority of the cases.
>
> 2. The custom resource allocation logic, i.e. how to assign the resources
> to different tasks, operators, and so on. This may require two levels /
> steps:
>     a. Subtask level - make sure the subtasks are put into suitable slots.
> It is done by the global RM and is not customizable right now.
>     b. Operator level - map the exact resource to the operators in TM. e.g.
> GPU 1 for operator A, GPU 2 for operator B. This step is needed assuming
> the global RM does not distinguish individual resources of the same type.
> It is true for memory, but not for GPU.
>
> The GPU manager is designed to do 2.b here. So it should discover the
> physical GPU information and bind/match them to each operators. Making this
> general will fill in the missing piece to support custom resource type
> definition. But I'd avoid calling it a "External Resource Manager" to avoid
> confusion with RM, maybe something like "Operator Resource Assigner" would
> be more accurate. So for each resource type users can have an optional
> "Operator Resource Assigner" in the TM. For memory, users don't need this,
> but for other extended resources, users may need that.
>
> Personally I think a pluggable "Operator Resource Assigner" is achievable
> in this FLIP. But I am also OK with having that in a separate FLIP because
> the interface between the "Operator Resource Assigner" and operator may
> take a while to settle down if we want to make it generic. But I think our
> implementation should take this future work into consideration so that we
> don't need to break backwards compatibility once we have that.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <se...@apache.org> wrote:
>
> > Thank you for writing this FLIP.
> >
> > I cannot really give much input into the mechanics of GPU-aware
> scheduling
> > and GPU allocation, as I have no experience with that.
> >
> > One thought I had when reading the proposal is if it makes sense to look
> at
> > the "GPU Manager" as an "External Resource Manager", and GPU is one such
> > resource.
> > The way I understand the ResourceProfile and ResourceSpec, that is how it
> > is done there.
> > It has the advantage that it looks more extensible. Maybe there is a GPU
> > Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a Alibaba
> > TPU Resource, etc.
> >
> > Best,
> > Stephan
> >
> >
> > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <be...@gmail.com> wrote:
> >
> > > Thanks for the FLIP Yangze. GPU resource management support is a
> > must-have
> > > for machine learning use cases. Actually it is one of the mostly asked
> > > question from the users who are interested in using Flink for ML.
> > >
> > > Some quick comments / questions to the wiki.
> > > 1. The WebUI / REST API should probably also be mentioned in the public
> > > interface section.
> > > 2. Is the data structure that holds GPU info also a public API?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for drafting the FLIP and kicking off the discussion, Yangze.
> > > >
> > > > Big +1 for this feature. Supporting using of GPU in Flink is
> > significant,
> > > > especially for the ML scenarios.
> > > > I've reviewed the FLIP wiki doc and it looks good to me. I think
> it's a
> > > > very good first step for Flink's GPU supports.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <ka...@gmail.com>
> wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > We would like to start a discussion thread on "FLIP-108: Add GPU
> > > > > support in Flink"[1].
> > > > >
> > > > > This FLIP mainly discusses the following issues:
> > > > >
> > > > > - Enable user to configure how many GPUs in a task executor and
> > > > > forward such requirements to the external resource managers (for
> > > > > Kubernetes/Yarn/Mesos setups).
> > > > > - Provide information of available GPU resources to operators.
> > > > >
> > > > > Key changes proposed in the FLIP are as follows:
> > > > >
> > > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > > - Introduce GPUManager as one of the task manager services to
> > discover
> > > > > and expose GPU resource information to the context of functions.
> > > > > - Introduce the default script for GPU discovery, in which we
> provide
> > > > > the privilege mode to help user to achieve worker-level isolation
> in
> > > > > standalone mode.
> > > > >
> > > > > Please find more details in the FLIP wiki document [1]. Looking
> > forward
> > > > to
> > > > > your feedbacks.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Becket Qin <be...@gmail.com>.
That's a good point, Stephan. It makes total sense to generalize the
resource management to support custom resources. Having that allows users
to add new resources by themselves. The general resource management may
involve two different aspects:

1. The custom resource type definition. It is supported by the extended
resources in ResourceProfile and ResourceSpec. This will likely cover
majority of the cases.

2. The custom resource allocation logic, i.e. how to assign the resources
to different tasks, operators, and so on. This may require two levels /
steps:
    a. Subtask level - make sure the subtasks are put into suitable slots.
It is done by the global RM and is not customizable right now.
    b. Operator level - map the exact resource to the operators in TM. e.g.
GPU 1 for operator A, GPU 2 for operator B. This step is needed assuming
the global RM does not distinguish individual resources of the same type.
It is true for memory, but not for GPU.

The GPU manager is designed to do 2.b here. So it should discover the
physical GPU information and bind/match them to each operators. Making this
general will fill in the missing piece to support custom resource type
definition. But I'd avoid calling it a "External Resource Manager" to avoid
confusion with RM, maybe something like "Operator Resource Assigner" would
be more accurate. So for each resource type users can have an optional
"Operator Resource Assigner" in the TM. For memory, users don't need this,
but for other extended resources, users may need that.

Personally I think a pluggable "Operator Resource Assigner" is achievable
in this FLIP. But I am also OK with having that in a separate FLIP because
the interface between the "Operator Resource Assigner" and operator may
take a while to settle down if we want to make it generic. But I think our
implementation should take this future work into consideration so that we
don't need to break backwards compatibility once we have that.

Thanks,

Jiangjie (Becket) Qin

On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <se...@apache.org> wrote:

> Thank you for writing this FLIP.
>
> I cannot really give much input into the mechanics of GPU-aware scheduling
> and GPU allocation, as I have no experience with that.
>
> One thought I had when reading the proposal is if it makes sense to look at
> the "GPU Manager" as an "External Resource Manager", and GPU is one such
> resource.
> The way I understand the ResourceProfile and ResourceSpec, that is how it
> is done there.
> It has the advantage that it looks more extensible. Maybe there is a GPU
> Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a Alibaba
> TPU Resource, etc.
>
> Best,
> Stephan
>
>
> On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <be...@gmail.com> wrote:
>
> > Thanks for the FLIP Yangze. GPU resource management support is a
> must-have
> > for machine learning use cases. Actually it is one of the mostly asked
> > question from the users who are interested in using Flink for ML.
> >
> > Some quick comments / questions to the wiki.
> > 1. The WebUI / REST API should probably also be mentioned in the public
> > interface section.
> > 2. Is the data structure that holds GPU info also a public API?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <to...@gmail.com>
> > wrote:
> >
> > > Thanks for drafting the FLIP and kicking off the discussion, Yangze.
> > >
> > > Big +1 for this feature. Supporting using of GPU in Flink is
> significant,
> > > especially for the ML scenarios.
> > > I've reviewed the FLIP wiki doc and it looks good to me. I think it's a
> > > very good first step for Flink's GPU supports.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <ka...@gmail.com> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > We would like to start a discussion thread on "FLIP-108: Add GPU
> > > > support in Flink"[1].
> > > >
> > > > This FLIP mainly discusses the following issues:
> > > >
> > > > - Enable user to configure how many GPUs in a task executor and
> > > > forward such requirements to the external resource managers (for
> > > > Kubernetes/Yarn/Mesos setups).
> > > > - Provide information of available GPU resources to operators.
> > > >
> > > > Key changes proposed in the FLIP are as follows:
> > > >
> > > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > > - Introduce GPUManager as one of the task manager services to
> discover
> > > > and expose GPU resource information to the context of functions.
> > > > - Introduce the default script for GPU discovery, in which we provide
> > > > the privilege mode to help user to achieve worker-level isolation in
> > > > standalone mode.
> > > >
> > > > Please find more details in the FLIP wiki document [1]. Looking
> forward
> > > to
> > > > your feedbacks.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Stephan Ewen <se...@apache.org>.
Thank you for writing this FLIP.

I cannot really give much input into the mechanics of GPU-aware scheduling
and GPU allocation, as I have no experience with that.

One thought I had when reading the proposal is if it makes sense to look at
the "GPU Manager" as an "External Resource Manager", and GPU is one such
resource.
The way I understand the ResourceProfile and ResourceSpec, that is how it
is done there.
It has the advantage that it looks more extensible. Maybe there is a GPU
Resource, a specialized NVIDIA GPU Resource, and FPGA Resource, a Alibaba
TPU Resource, etc.

Best,
Stephan


On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <be...@gmail.com> wrote:

> Thanks for the FLIP Yangze. GPU resource management support is a must-have
> for machine learning use cases. Actually it is one of the mostly asked
> question from the users who are interested in using Flink for ML.
>
> Some quick comments / questions to the wiki.
> 1. The WebUI / REST API should probably also be mentioned in the public
> interface section.
> 2. Is the data structure that holds GPU info also a public API?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <to...@gmail.com>
> wrote:
>
> > Thanks for drafting the FLIP and kicking off the discussion, Yangze.
> >
> > Big +1 for this feature. Supporting using of GPU in Flink is significant,
> > especially for the ML scenarios.
> > I've reviewed the FLIP wiki doc and it looks good to me. I think it's a
> > very good first step for Flink's GPU supports.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Hi everyone,
> > >
> > > We would like to start a discussion thread on "FLIP-108: Add GPU
> > > support in Flink"[1].
> > >
> > > This FLIP mainly discusses the following issues:
> > >
> > > - Enable user to configure how many GPUs in a task executor and
> > > forward such requirements to the external resource managers (for
> > > Kubernetes/Yarn/Mesos setups).
> > > - Provide information of available GPU resources to operators.
> > >
> > > Key changes proposed in the FLIP are as follows:
> > >
> > > - Forward GPU resource requirements to Yarn/Kubernetes.
> > > - Introduce GPUManager as one of the task manager services to discover
> > > and expose GPU resource information to the context of functions.
> > > - Introduce the default script for GPU discovery, in which we provide
> > > the privilege mode to help user to achieve worker-level isolation in
> > > standalone mode.
> > >
> > > Please find more details in the FLIP wiki document [1]. Looking forward
> > to
> > > your feedbacks.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > >
> > > Best,
> > > Yangze Guo
> > >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Becket Qin <be...@gmail.com>.
Thanks for the FLIP Yangze. GPU resource management support is a must-have
for machine learning use cases. Actually it is one of the mostly asked
question from the users who are interested in using Flink for ML.

Some quick comments / questions to the wiki.
1. The WebUI / REST API should probably also be mentioned in the public
interface section.
2. Is the data structure that holds GPU info also a public API?

Thanks,

Jiangjie (Becket) Qin

On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <to...@gmail.com> wrote:

> Thanks for drafting the FLIP and kicking off the discussion, Yangze.
>
> Big +1 for this feature. Supporting using of GPU in Flink is significant,
> especially for the ML scenarios.
> I've reviewed the FLIP wiki doc and it looks good to me. I think it's a
> very good first step for Flink's GPU supports.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <ka...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > We would like to start a discussion thread on "FLIP-108: Add GPU
> > support in Flink"[1].
> >
> > This FLIP mainly discusses the following issues:
> >
> > - Enable user to configure how many GPUs in a task executor and
> > forward such requirements to the external resource managers (for
> > Kubernetes/Yarn/Mesos setups).
> > - Provide information of available GPU resources to operators.
> >
> > Key changes proposed in the FLIP are as follows:
> >
> > - Forward GPU resource requirements to Yarn/Kubernetes.
> > - Introduce GPUManager as one of the task manager services to discover
> > and expose GPU resource information to the context of functions.
> > - Introduce the default script for GPU discovery, in which we provide
> > the privilege mode to help user to achieve worker-level isolation in
> > standalone mode.
> >
> > Please find more details in the FLIP wiki document [1]. Looking forward
> to
> > your feedbacks.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> >
> > Best,
> > Yangze Guo
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Posted by Xintong Song <to...@gmail.com>.
Thanks for drafting the FLIP and kicking off the discussion, Yangze.

Big +1 for this feature. Supporting using of GPU in Flink is significant,
especially for the ML scenarios.
I've reviewed the FLIP wiki doc and it looks good to me. I think it's a
very good first step for Flink's GPU supports.

Thank you~

Xintong Song



On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <ka...@gmail.com> wrote:

> Hi everyone,
>
> We would like to start a discussion thread on "FLIP-108: Add GPU
> support in Flink"[1].
>
> This FLIP mainly discusses the following issues:
>
> - Enable user to configure how many GPUs in a task executor and
> forward such requirements to the external resource managers (for
> Kubernetes/Yarn/Mesos setups).
> - Provide information of available GPU resources to operators.
>
> Key changes proposed in the FLIP are as follows:
>
> - Forward GPU resource requirements to Yarn/Kubernetes.
> - Introduce GPUManager as one of the task manager services to discover
> and expose GPU resource information to the context of functions.
> - Introduce the default script for GPU discovery, in which we provide
> the privilege mode to help user to achieve worker-level isolation in
> standalone mode.
>
> Please find more details in the FLIP wiki document [1]. Looking forward to
> your feedbacks.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
>
> Best,
> Yangze Guo
>