You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by "Cecile, Adam" <Ad...@hitec.lu> on 2017/01/17 11:01:02 UTC

CUDA support makes slave receiving no jobs

Hello,


I just tried to enable CUDA support but when it's done the slave refuse to start anything (marathon job stuck in deploying state).

If I replace isolation setting from "cgroups/cpu,cgroups/mem,cgroups/devices,gpu/nvidia" to "cgroups/cpu,cgroups/mem,cgroups/devices" jobs get started again.


Of course, I couldn't not find anything useful in the log file (attached).


Can someone have a look and let me know if there's something broken/badly configured/whatever ?


Thanks in advance,

?

Best regards, Adam.


Re: CUDA support makes slave receiving no jobs

Posted by Kevin Klues <kl...@gmail.com>.
If you are running on standalone mesos+marathon, make sure you enable the
marathon flag for '--enable_features=gpu_resources' (and make sure you have
a version of marathon that supports this, i.e. 1.3). If you are on DC/OS,
then make sure you are running a very recent build (no version that's been
released supports this yet), and follow the instructions at
https://github.com/dcos/dcos/pull/766

Cecile, Adam <Ad...@hitec.lu> schrieb am Di. 17. Jan. 2017 um 03:01:

> Hello,
>
>
> I just tried to enable CUDA support but when it's done the slave refuse to
> start anything (marathon job stuck in deploying state).
>
> If I replace isolation setting from
> "cgroups/cpu,cgroups/mem,cgroups/devices,gpu/nvidia" to
> "cgroups/cpu,cgroups/mem,cgroups/devices" jobs get started again.
>
>
> Of course, I couldn't not find anything useful in the log file (attached).
>
>
> Can someone have a look and let me know if there's something broken/badly
> configured/whatever ?
>
>
> Thanks in advance,
>
> ​
>
> Best regards, Adam.
>
>