You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Niketan Pansare (JIRA)" <ji...@apache.org> on 2017/03/14 18:04:41 UTC

[jira] [Resolved] (SYSTEMML-1396) Enable lazily freeing cuda allocated memory chunks

     [ https://issues.apache.org/jira/browse/SYSTEMML-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Niketan Pansare resolved SYSTEMML-1396.
---------------------------------------
    Resolution: Fixed

> Enable lazily freeing cuda allocated memory chunks
> --------------------------------------------------
>
>                 Key: SYSTEMML-1396
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1396
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Runtime
>            Reporter: Nakul Jindal
>            Assignee: Nakul Jindal
>             Fix For: SystemML 1.0
>
>
> The current version of deallocating cuda memory chunks is done asynchronously. That came about as a result of the {{cudaFree}} operations being expensive and so the thought process of doing cudaFree asynchronously was that the cudaFree could happen when the CPU was busy with other work. In tight loops where most operations are done on the GPU, the asynchronous cudaFree weren't really asynchronous. Operations waiting to use the GPU would pay the penalty for the cudaFree operation.
> After adding extra instrumentation, it was determined that {{cudaAlloc}} operations were fairly expensive as well. 
> Most GPU operations are done in loops with constantly allocating and deallocating the same size of memory chunks per loop. What would be more efficient is to "clear out" or set the memory to 0 instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)