You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by "Liu Hui (JIRA)" <ji...@apache.org> on 2019/03/25 06:39:00 UTC

[jira] [Created] (SINGA-435) Rafiki--Can't create a train job with 'ENABLE_GPU'

Liu Hui created SINGA-435:
-----------------------------

             Summary: Rafiki--Can't create a train job with 'ENABLE_GPU'
                 Key: SINGA-435
                 URL: https://issues.apache.org/jira/browse/SINGA-435
             Project: Singa
          Issue Type: Bug
            Reporter: Liu Hui
         Attachments: rafiki_admin001.png

>>https://nginyc.github.io/rafiki/docs/latest/docs/src/user/quickstart.html

I followed the quickstart and tried to create a train job with using GPU。
So I changed parameters to "budget=\{'ENABLE_GPU':1, 'MODEL_TRIAL_COUNT': 2 }" .when I create a train job.

But the container of rafiki_worker didn't start.

I entered the container of rafiki_admin, and found an error in log file.

Finally I found that, in rafiki/rafiki/container/docker_swarm.py, the function of _if_any_node_has_gpu always return False.

I doubt that what should I do to do a training with GPU in container. Which steps have I missed, setting up docker's environment or others?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)