You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by "Ngin Yun Chuan (JIRA)" <ji...@apache.org> on 2018/11/19 02:40:00 UTC
[jira] [Comment Edited] (SINGA-406) [Rafiki] Add POS tagging task & add GPU support (0.0.7)

    [ https://issues.apache.org/jira/browse/SINGA-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16691196#comment-16691196 ] 

Ngin Yun Chuan edited comment on SINGA-406 at 11/19/18 2:39 AM:
----------------------------------------------------------------

The `nvidia/cuda:9.0-runtime-ubuntu16.04` seems to run workers correctly on my mac machine without GPU, and in combination with setting `CUDA_VISIBLE_DEVICES` dynamically during worker deployment, we can stay with 1 worker image that works on both CPU-only machines and machines with GPU. Would there be any problems with this setup?

If we have another worker image for CPU-only e.g. `rafiki_worker_cpu`, does it mean that model developers need to extend from *both* worker Docker images to support model training on both CPU and GPU, if they want to provide their custom Docker image? Or should we drop this configurable option?

If we let app developers configure the Docker container at runtime, does it mean that they will now have to know about the models that would be trained on their dataset and understand the dependencies of each model (model developers might need document)? If they are allowed to provide any Docker container, they must extend Rafiki's worker image, build the image themselves, and submit to DockerHub, and they must account for the dependencies of each model during training. Feel like doing it this way makes it complex for the app developer?


was (Author: nginyc):
The ``nvidia/cuda:9.0-runtime-ubuntu16.04`` seems to run workers correctly on my mac machine without GPU, and in combination with setting ``CUDA_VISIBLE_DEVICES`` dynamically during worker deployment, we can stay with 1 worker image that works on both CPU-only machines and machines with GPU. Would there be any problems with this setup?

If we have another worker image for CPU-only e.g. `rafiki_worker_cpu`, does it mean that model developers need to extend from *both* worker Docker images to support model training on both CPU and GPU, if they want to provide their custom Docker image? Or should we drop this configurable option?

If we let app developers configure the Docker container at runtime, does it mean that they will now have to know about the models that would be trained on their dataset and understand the dependencies of each model (model developers might need document)? If they are allowed to provide any Docker container, they must extend Rafiki's worker image, build the image themselves, and submit to DockerHub, and they must account for the dependencies of each model during training. Feel like doing it this way makes it complex for the app developer?

> [Rafiki] Add POS tagging task & add GPU support (0.0.7)
> -------------------------------------------------------
>
>                 Key: SINGA-406
>                 URL: https://issues.apache.org/jira/browse/SINGA-406
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: Ngin Yun Chuan
>            Priority: Major
>
> Refer to https://github.com/nginyc/rafiki/pull/71 for details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)