You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Rong Ou <ro...@gmail.com> on 2019/02/08 22:27:49 UTC

building docker images for GPU

Hi spark dev,

I created a JIRA issue a while ago (
https://issues.apache.org/jira/browse/SPARK-26398) to add GPU support to
Spark docker images, and sent a PR (
https://github.com/apache/spark/pull/23347) that went through several
iterations. It was suggested that it should be discussed on the dev mailing
list, so here we are. Please chime in if you have any questions or concerns.

A little more background. I mainly looked at running XGBoost on Spark using
GPUs. Preliminary results have shown that there is potential for
significant speedup in training time. This seems like a popular use case
for Spark. In any event, it'd be nice for Spark to have better support for
GPUs. Building gpu-enabled docker images seems like a useful first step.

Thanks,

Rong

Re: building docker images for GPU

Posted by Chen Qin <qi...@gmail.com>.
Just notice current spark task scheduling doesn't recognize any /device as
constraints.
What might happen as a result would be multiple tasks stuck on racing to
acquire GPU/FPGA (you name it)

Not sure if "multiple process"on one GPU works same as how CPU designed. If
not, we should consider kinda binding in task scheduler and executorInfo.
eg. task 0 executor 1 2 cpu /device/gpu/0
      task 1 executor 1 2 cpu /device/gpu/1

Chen

On Tue, Feb 12, 2019 at 11:04 AM Marcelo Vanzin <va...@cloudera.com.invalid>
wrote:

> I think I remember someone mentioning a thread about this on the PR
> discussion, and digging a bit I found this:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Toward-an-quot-API-quot-for-spark-images-used-by-the-Kubernetes-back-end-td23622.html
>
> It started a discussion but I haven't really found any conclusion.
>
> In my view here the discussion is the same: what is the contract
> between the Spark code that launches the driver / executor pods, and
> the images?
>
> Right now the contract is defined by the code, which makes it a little
> awkward for people to have their own customized images. They need to
> kinda follow what the images in the repo do and hope they get it
> right.
>
> If instead you define the contract and make the code follow it, then
> it becomes easier for people to provide whatever image they want.
>
> Matt also filed SPARK-24655, which has seen no progress nor discussion.
>
> Someone else filed SPARK-26773, which is similar.
>
> And another person filed SPARK-26597, which is also in the same vein,
> and also suggests something that in the end I agree with: Spark
> shouldn't be opinionated about the image and what it has; it should
> tell the container to run a Spark command to start the driver or
> executor, which should be in the image's path, and shouldn't require
> an entry point at all.
>
> Anyway, just wanted to point out that this discussion isn't as simple
> as "GPU vs. not GPU", but it's a more fundamental discussion about
> what should the container image look like, so that people can
> customize it easily. After all, that's one of the main points of using
> container images, right?
>
> On Mon, Feb 11, 2019 at 11:53 AM Matt Cheah <mc...@palantir.com> wrote:
> >
> > I will reiterate some feedback I left on the PR. Firstly, it’s not
> immediately clear if we should be opinionated around supporting GPUs in the
> Docker image in a first class way.
> >
> >
> >
> > Firstly there’s the question of how we arbitrate the kinds of
> customizations we support moving forward. For example if we say we support
> GPUs now, what’s to say that we should not also support FPGAs?
> >
> >
> >
> > Also what kind of testing can we add to CI to ensure what we’ve provided
> in this Dockerfile works?
> >
> >
> >
> > Instead we can make the Spark images have bare minimum support for basic
> Spark applications, and then provide detailed instructions for how to build
> custom Docker images (mostly just needing to make sure the custom image has
> the right entry point).
> >
> >
> >
> > -Matt Cheah
> >
> >
> >
> > From: Rong Ou <ro...@gmail.com>
> > Date: Friday, February 8, 2019 at 2:28 PM
> > To: "dev@spark.apache.org" <de...@spark.apache.org>
> > Subject: building docker images for GPU
> >
> >
> >
> > Hi spark dev,
> >
> >
> >
> > I created a JIRA issue a while ago (
> https://issues.apache.org/jira/browse/SPARK-26398 [issues.apache.org]) to
> add GPU support to Spark docker images, and sent a PR (
> https://github.com/apache/spark/pull/23347 [github.com]) that went
> through several iterations. It was suggested that it should be discussed on
> the dev mailing list, so here we are. Please chime in if you have any
> questions or concerns.
> >
> >
> >
> > A little more background. I mainly looked at running XGBoost on Spark
> using GPUs. Preliminary results have shown that there is potential for
> significant speedup in training time. This seems like a popular use case
> for Spark. In any event, it'd be nice for Spark to have better support for
> GPUs. Building gpu-enabled docker images seems like a useful first step.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Rong
> >
> >
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: building docker images for GPU

Posted by Marcelo Vanzin <va...@cloudera.com.INVALID>.
I think I remember someone mentioning a thread about this on the PR
discussion, and digging a bit I found this:
http://apache-spark-developers-list.1001551.n3.nabble.com/Toward-an-quot-API-quot-for-spark-images-used-by-the-Kubernetes-back-end-td23622.html

It started a discussion but I haven't really found any conclusion.

In my view here the discussion is the same: what is the contract
between the Spark code that launches the driver / executor pods, and
the images?

Right now the contract is defined by the code, which makes it a little
awkward for people to have their own customized images. They need to
kinda follow what the images in the repo do and hope they get it
right.

If instead you define the contract and make the code follow it, then
it becomes easier for people to provide whatever image they want.

Matt also filed SPARK-24655, which has seen no progress nor discussion.

Someone else filed SPARK-26773, which is similar.

And another person filed SPARK-26597, which is also in the same vein,
and also suggests something that in the end I agree with: Spark
shouldn't be opinionated about the image and what it has; it should
tell the container to run a Spark command to start the driver or
executor, which should be in the image's path, and shouldn't require
an entry point at all.

Anyway, just wanted to point out that this discussion isn't as simple
as "GPU vs. not GPU", but it's a more fundamental discussion about
what should the container image look like, so that people can
customize it easily. After all, that's one of the main points of using
container images, right?

On Mon, Feb 11, 2019 at 11:53 AM Matt Cheah <mc...@palantir.com> wrote:
>
> I will reiterate some feedback I left on the PR. Firstly, it’s not immediately clear if we should be opinionated around supporting GPUs in the Docker image in a first class way.
>
>
>
> Firstly there’s the question of how we arbitrate the kinds of customizations we support moving forward. For example if we say we support GPUs now, what’s to say that we should not also support FPGAs?
>
>
>
> Also what kind of testing can we add to CI to ensure what we’ve provided in this Dockerfile works?
>
>
>
> Instead we can make the Spark images have bare minimum support for basic Spark applications, and then provide detailed instructions for how to build custom Docker images (mostly just needing to make sure the custom image has the right entry point).
>
>
>
> -Matt Cheah
>
>
>
> From: Rong Ou <ro...@gmail.com>
> Date: Friday, February 8, 2019 at 2:28 PM
> To: "dev@spark.apache.org" <de...@spark.apache.org>
> Subject: building docker images for GPU
>
>
>
> Hi spark dev,
>
>
>
> I created a JIRA issue a while ago (https://issues.apache.org/jira/browse/SPARK-26398 [issues.apache.org]) to add GPU support to Spark docker images, and sent a PR (https://github.com/apache/spark/pull/23347 [github.com]) that went through several iterations. It was suggested that it should be discussed on the dev mailing list, so here we are. Please chime in if you have any questions or concerns.
>
>
>
> A little more background. I mainly looked at running XGBoost on Spark using GPUs. Preliminary results have shown that there is potential for significant speedup in training time. This seems like a popular use case for Spark. In any event, it'd be nice for Spark to have better support for GPUs. Building gpu-enabled docker images seems like a useful first step.
>
>
>
> Thanks,
>
>
>
> Rong
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: building docker images for GPU

Posted by Matt Cheah <mc...@palantir.com>.
I will reiterate some feedback I left on the PR. Firstly, it’s not immediately clear if we should be opinionated around supporting GPUs in the Docker image in a first class way.

 

Firstly there’s the question of how we arbitrate the kinds of customizations we support moving forward. For example if we say we support GPUs now, what’s to say that we should not also support FPGAs?

 

Also what kind of testing can we add to CI to ensure what we’ve provided in this Dockerfile works?

 

Instead we can make the Spark images have bare minimum support for basic Spark applications, and then provide detailed instructions for how to build custom Docker images (mostly just needing to make sure the custom image has the right entry point).

 

-Matt Cheah

 

From: Rong Ou <ro...@gmail.com>
Date: Friday, February 8, 2019 at 2:28 PM
To: "dev@spark.apache.org" <de...@spark.apache.org>
Subject: building docker images for GPU

 

Hi spark dev, 

 

I created a JIRA issue a while ago (https://issues.apache.org/jira/browse/SPARK-26398 [issues.apache.org]) to add GPU support to Spark docker images, and sent a PR (https://github.com/apache/spark/pull/23347 [github.com]) that went through several iterations. It was suggested that it should be discussed on the dev mailing list, so here we are. Please chime in if you have any questions or concerns.

 

A little more background. I mainly looked at running XGBoost on Spark using GPUs. Preliminary results have shown that there is potential for significant speedup in training time. This seems like a popular use case for Spark. In any event, it'd be nice for Spark to have better support for GPUs. Building gpu-enabled docker images seems like a useful first step.

 

Thanks,

 

Rong