You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by "Cecile, Adam" <Ad...@hitec.lu> on 2017/01/17 11:01:02 UTC
CUDA support makes slave receiving no jobs
Hello,
I just tried to enable CUDA support but when it's done the slave refuse to start anything (marathon job stuck in deploying state).
If I replace isolation setting from "cgroups/cpu,cgroups/mem,cgroups/devices,gpu/nvidia" to "cgroups/cpu,cgroups/mem,cgroups/devices" jobs get started again.
Of course, I couldn't not find anything useful in the log file (attached).
Can someone have a look and let me know if there's something broken/badly configured/whatever ?
Thanks in advance,
?
Best regards, Adam.
Re: CUDA support makes slave receiving no jobs
Posted by Kevin Klues <kl...@gmail.com>.
If you are running on standalone mesos+marathon, make sure you enable the
marathon flag for '--enable_features=gpu_resources' (and make sure you have
a version of marathon that supports this, i.e. 1.3). If you are on DC/OS,
then make sure you are running a very recent build (no version that's been
released supports this yet), and follow the instructions at
https://github.com/dcos/dcos/pull/766
Cecile, Adam <Ad...@hitec.lu> schrieb am Di. 17. Jan. 2017 um 03:01:
> Hello,
>
>
> I just tried to enable CUDA support but when it's done the slave refuse to
> start anything (marathon job stuck in deploying state).
>
> If I replace isolation setting from
> "cgroups/cpu,cgroups/mem,cgroups/devices,gpu/nvidia" to
> "cgroups/cpu,cgroups/mem,cgroups/devices" jobs get started again.
>
>
> Of course, I couldn't not find anything useful in the log file (attached).
>
>
> Can someone have a look and let me know if there's something broken/badly
> configured/whatever ?
>
>
> Thanks in advance,
>
>
>
> Best regards, Adam.
>
>