You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "chester kuo (JIRA)" <ji...@apache.org> on 2015/02/02 04:54:34 UTC

[jira] [Commented] (MESOS-2262) Adding GPGPU resource into Mesos framework, so we can know if any GPGPU resource are available for master

    [ https://issues.apache.org/jira/browse/MESOS-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300857#comment-14300857 ] 

chester kuo commented on MESOS-2262:
------------------------------------

Hi Adam,

Ya, using resources will be good here , but i don't plan to isolate individual GPU cores since the run time library should handle this instead of assign it directly from application, which is we don't need to know how work-item be handled.
(although we can create multiple sub devices, and each sub devices can run independent command queue).

But we need to have master know if there are multiple GPU devices(resources) on this slave, so it can be utilized if available.

Chester

> Adding GPGPU resource into Mesos framework, so we can know if any GPGPU resource are available for master
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-2262
>                 URL: https://issues.apache.org/jira/browse/MESOS-2262
>             Project: Mesos
>          Issue Type: Task
>          Components: framework, slave
>         Environment: OpenCL support env, such as OS X, Linux, Windows..
>            Reporter: chester kuo
>            Priority: Minor
>
> Extending Mesos to support Heterogeneous resource such as GPGPU/FPGA..etc as computing resources in the data-center, OpenCL will be first target to add into Mesos (support by all major GPU vendor) , I will reserve to support others such as CUDA in the future.
> In this feature, slave will be supported to do resources discover including but not limited to, 
> (1) Heterogeneous Computing protocol type : "OpenCL". "CUDA", "HSA"
> (2) Computing global memory (MB)
> (3) Computing run time version , such as "1.2" , "2.0"
> (4) Computing compute unit (double)
> (5) Computing device type : GPGPU, CPU, Accelerator device.
> (6) Computing (number of devices): (double)
> The Heterogeneous resource isolation will be supported in the framework instead of in the slave devices side, the major reason here is , the ecosystem , such as OpenCL operate on top of private device driver own by vendors, only runtime library (OpenCL) is user-space application, so its hard for us to do like Linux cgroup to have CPU/memory resource isolation. As a result we may use run time library to do device isolation and memory allocation.
> (PS, if anyone know how to do it for GPGPU driver, please drop me a note)
> Meanwhile, some run-time library (such as OpenCL) support to run on top of CPU, so we need to use isolator API to notify this once it allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)