You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Benjamin Mahler <bm...@apache.org> on 2016/06/21 21:57:16 UTC

GPU Support: Library Injection

Moving this to a new thread (see some context below).

It may be worth exploring adding a generic mechanism for doing label-based
injection of volumes: if a container is tagged with a particular label, we
will inject a particular volume into the container.

For Nvidia GPU containers, the operator could use this mechanism to
explicitly specify volumes for labels (if they choose to create the volumes
themselves). In addition, Mesos may ship with some provided label-based
volume injectors for "well-known" labels (like the one that Nvidia uses in
its docker containers). And finally, you could implement a module that
provides your own volume injector.

Note that this is a bit different than docker volume plugins in that the
volume plugins are not allowed to *add* volumes to containers. The volumes
are specified via docker's '-v' flag (IIRC the 'nvidia-docker' wrapper adds
a '-v' flag to the underlying docker run command to inject volumes).

On Mon, Jun 20, 2016 at 7:59 PM, Kevin Klues <kl...@gmail.com> wrote:

> The goal is to let users leverage the nvidia Docker images
> (https://hub.docker.com/r/nvidia/) without any added effort on their
> behalf. Using docker they are able to launch containers from these
> images by simply running `nvidia-docker run ...` (i.e. they are
> unaware that a magic volume is being injected on their behalf). On
> Mesos we want the experience to be similar.
>
> In terms of providing an external component to do the library
> consolidation instead of building it into Mesos itself -- we
> considered this.  We originally planned on building this functionality
> as an isolator module (giving us the benefit of external linkage
> without having to run a separate linux process), but there some some
> limitations with the current isolator interface that prohibit us from
> doing this properly. Moreover, building it as an isolator module would
> mean that it couldn't be shared by the docker containerizer (which we
> plan to add support for in the future).
>
> On Mon, Jun 20, 2016 at 7:30 PM, Jean Christophe “JC” Martin
> <jc...@gmail.com> wrote:
> > Kevin,
> >
> > I agree about the need to create the volume, and gather the information.
> My point was not really clear, sorry.
> > My point was that it should not be different than any use case needing
> special mounts and could either be solved by passing this information at
> the time of container creation (it doesn’t seem that there are that many
> libraries, and it would not be harder than say running the mesos slave in a
> container, purely from a number of volume statements), or it could be
> solved externally as the docker volume container does with a more generic
> solution.
> >
> > Thanks,
> >
> > JC
> >
> >> On Jun 20, 2016, at 6:59 PM, Kevin Klues <kl...@gmail.com> wrote:
> >>
> >> For now we've decided to actually remove the hard dependence on libelf
> >> for the 1.0 release and spend a bit more time thinking about the right
> >> way to pull it in.
> >>
> >> Jean, to answer your question though -- someone would still need to
> >> consolidate these libraries, even if it wasn't left to Mesos to do so.
> >> These libraries are spread across the file system, and need to be
> >> pulled into a single place for easy injection. The full list of
> >> binaries / libraries are here:
> >>
> >>
> https://github.com/NVIDIA/nvidia-docker/blob/master/tools/src/nvidia/volumes.go#L109
> >>
> >> We could put this burden on the operator and trust he gets it right,
> >> or we could have Mesos programmatically do it itself. We considered
> >> just leveraging the nvidia-docker-plugin itself (instead of
> >> duplicating its functionality into mesos), but ultimately decided it
> >> was better not to introduce an external dependency on it (since it is
> >> a separate running excutable, rather than a simple library, like
> >> libelf).
> >>
> >> On Mon, Jun 20, 2016 at 5:12 PM, Jean Christophe “JC” Martin
> >> <jc...@gmail.com> wrote:
> >>> As an operator not using GPUs, I feel that the burden seems misplaced,
> and disproportionate.
> >>> I assume that the operator of a GPU cluster knows the location of the
> libraries based on their OS, and could potentially provide this information
> at the time of creating the containers. I am not sure to see why this
> something that mesos is required to do (consolidating the libraries in the
> volume, versus being a configuration/external information).
> >>>
> >>> Thanks,
> >>>
> >>> JC
> >>>
> >>>> On Jun 20, 2016, at 2:30 PM, Kevin Klues <kl...@gmail.com> wrote:
> >>>>
> >>>> Sorry, the ticket just links to the nvidia-docker project without much
> >>>> further explanation. The information at the link below should make it
> >>>> a bit more clear:
> >>>>
> >>>> https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver.
> >>>>
> >>>> The crux of the issue is that we need to be able consolidate all of
> >>>> the Nvidia binaries/libraries into a single volume that we inject into
> >>>> a docker container.  We use libelf is used to get the canonical names
> >>>> of all the Nvidia libraries (i.e. SONAME in their dynamic sections) as
> >>>> well as lookup what external dependences they have (i.e. NEEDED in
> >>>> their dynamic sections) in order to build this volume.
> >>>>
> >>>> NOTE: None of this volume support is actually in Mesos yet -- we just
> >>>> added the libelf dependence in anticipation of it.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Jun 20, 2016 at 12:59 PM, Yan Xu <xu...@apple.com> wrote:
> >>>>> It's not immediately clear form the ticket why the change from
> optional
> >>>>> dependency to required dependency though? Could you summarize?
> >>>>>
> >>>>>
> >>>>> On Sun, Jun 19, 2016 at 12:33 PM, Kevin Klues <kl...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> Thanks Zhitao,
> >>>>>>
> >>>>>> I just pushed out a review for upgrades.md and added you as a
> reviewer.
> >>>>>>
> >>>>>> The new dependence was added in the JIRA that haosdent linked, but
> the
> >>>>>> actual reason for adding the dependence is more related to:
> >>>>>> https://issues.apache.org/jira/browse/MESOS-5401
> >>>>>>
> >>>>>> On Sun, Jun 19, 2016 at 9:34 AM, haosdent <ha...@gmail.com>
> wrote:
> >>>>>>> The related issue is Change build to always enable Nvidia GPU
> support
> >>>>>>> for
> >>>>>>> Linux
> >>>>>>> Last time my local build break before Kevin send out the email,
> and then
> >>>>>>> find this change.
> >>>>>>>
> >>>>>>> On Mon, Jun 20, 2016 at 12:11 AM, Zhitao Li <zhitaoli.cs@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi Kevin,
> >>>>>>>>
> >>>>>>>> Thanks for letting us know. It seems like this is not called out
> in
> >>>>>>>> upgrades.md, so can you please document this additional
> dependency
> >>>>>>>> there?
> >>>>>>>>
> >>>>>>>> Also, can you include the link to the JIRA or patch requiring this
> >>>>>>>> dependency so we can have some contexts?
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> On Sat, Jun 18, 2016 at 10:25 AM, Kevin Klues <kl...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hello all,
> >>>>>>>>>
> >>>>>>>>> Just an FYI that the newest libmesos now has an external
> dependence
> >>>>>>>>> on
> >>>>>>>>> libelf on Linux. This dependence can be installed via the
> following
> >>>>>>>>> packages:
> >>>>>>>>>
> >>>>>>>>> CentOS 6/7:     yum install elfutils-libelf.x86_64
> >>>>>>>>> Ubuntu14.04:   apt-get install libelf1
> >>>>>>>>>
> >>>>>>>>> Alternatively you can install from source:
> >>>>>>>>> https://directory.fsf.org/wiki/Libelf
> >>>>>>>>>
> >>>>>>>>> For developers, you will also need to install the libelf headers
> in
> >>>>>>>>> order to build master. This dependency can be installed via:
> >>>>>>>>>
> >>>>>>>>> CentOS: elfutils-libelf-devel.x86_64
> >>>>>>>>> Ubuntu: libelf-dev
> >>>>>>>>>
> >>>>>>>>> Alternatively, you can install from source:
> >>>>>>>>> https://directory.fsf.org/wiki/Libelf
> >>>>>>>>>
> >>>>>>>>> The getting started guide and the support/docker_build.sh scripts
> >>>>>>>>> have
> >>>>>>>>> been updated appropriately, but you may need to update your local
> >>>>>>>>> environment if you don't yet have these packages installed.
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> ~Kevin
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Cheers,
> >>>>>>>>
> >>>>>>>> Zhitao Li
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best Regards,
> >>>>>>> Haosdent Huang
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> ~Kevin
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> ~Kevin
> >>>
> >>
> >>
> >>
> >> --
> >> ~Kevin
> >
>
>
>
> --
> ~Kevin
>