You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/09/01 23:36:49 UTC

[GitHub] [incubator-tvm] vathysjacob opened a new issue #6376: [BUG] Memory leak in Metal runtime device api

vathysjacob opened a new issue #6376:
URL: https://github.com/apache/incubator-tvm/issues/6376


   
   I have been playing with swapping in and out models and targets in an iOS app and noticed that when a Metal target model is loaded the app memory usage increases monotonically, in my case by about 22MB at a time.
   
   This is being caused by a Metal buffer being retained, but never released:
   
   https://github.com/apache/incubator-tvm/blob/master/src/runtime/metal/metal_device_api.mm#L162
   
   Making the following change appears to resolve the issue for me, but I'm not yet familiar enough with Metal or iOS/Objective-C best practices to be comfortable offering my own PR.
   
   ```
     @@ -159,7 +159,7 @@ int GetWarpSize(id<MTLDevice> dev) {
      */
      id<MTLBuffer> buf = [dev newBufferWithLength:nbytes options:storage_mode];
      CHECK(buf != nil);
   -  return (__bridge void*)([buf retain]);
   +  return (__bridge void*)buf;
    }
    ```
   
   It is not clear to me if the other retain calls are actually necessary, perhaps someone more knowledgeable in this area can weigh in?
   
   
   ```
   diff --git a/src/runtime/metal/metal_device_api.mm b/src/runtime/metal/metal_device_api.mm
   index fddeadf86..0e8b8aae0 100644
   --- a/src/runtime/metal/metal_device_api.mm
   +++ b/src/runtime/metal/metal_device_api.mm
   @@ -126,14 +126,14 @@ int GetWarpSize(id<MTLDevice> dev) {
    #if TARGET_OS_IPHONE
      // on iPhone
      id<MTLDevice> d = MTLCreateSystemDefaultDevice();
   -  devices.push_back([d retain]);
   -  queues.push_back([[d newCommandQueue] retain]);
   +  devices.push_back(d );
   +  queues.push_back([d newCommandQueue] );
    #else
      NSArray<id<MTLDevice> >* devs = MTLCopyAllDevices();
      for (size_t i = 0; i < devs.count; ++i) {
        id<MTLDevice> d = [devs objectAtIndex:i];
   -    devices.push_back([d retain]);
   -    queues.push_back([[d newCommandQueue] retain]);
   +    devices.push_back(d );
   +    queues.push_back([d newCommandQueue]);
        LOG(INFO) << "Intializing Metal device " << i << ", name=" << [d.name UTF8String];
        warp_size.push_back(GetWarpSize(d));
      }
   @@ -159,7 +159,7 @@ int GetWarpSize(id<MTLDevice> dev) {
      */
      id<MTLBuffer> buf = [dev newBufferWithLength:nbytes options:storage_mode];
      CHECK(buf != nil);
   -  return (__bridge void*)([buf retain]);
   +  return (__bridge void*)buf;
    }
   
    void MetalWorkspace::FreeDataSpace(TVMContext ctx, void* ptr) {
   @@ -264,8 +264,8 @@ int GetWarpSize(id<MTLDevice> dev) {
        if (temp_buffer_[ctx.device_id] != nil) {
          [temp_buffer_[ctx.device_id] release];
        }
   -    temp_buffer_[ctx.device_id] = [[dev newBufferWithLength:size
   -                                                    options:MTLStorageModeShared] retain];
   +    temp_buffer_[ctx.device_id] = [dev newBufferWithLength:size
   +                                                    options:MTLStorageModeShared];
      }
      return temp_buffer_[ctx.device_id];
    }
    ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] vathysjacob commented on issue #6376: [BUG] Memory leak in Metal runtime device api

Posted by GitBox <gi...@apache.org>.
vathysjacob commented on issue #6376:
URL: https://github.com/apache/incubator-tvm/issues/6376#issuecomment-687279729


   @tqchen - Sure, I'll take a look. It may take me a couple days to get around to investigating if there are other leaks in the retain/release handling.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen commented on issue #6376: [BUG] Memory leak in Metal runtime device api

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #6376:
URL: https://github.com/apache/incubator-tvm/issues/6376#issuecomment-690358667


   Thanks @jacobpostman @vathysjacob it would also be nice if you can experiment further about what happens to the rest of the memory


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen commented on issue #6376: [BUG] Memory leak in Metal runtime device api

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #6376:
URL: https://github.com/apache/incubator-tvm/issues/6376#issuecomment-689888634


   @vathysjacob any updates?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen commented on issue #6376: [BUG] Memory leak in Metal runtime device api

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #6376:
URL: https://github.com/apache/incubator-tvm/issues/6376#issuecomment-686641648


   Thanks @vathysjacob ! please see if https://github.com/apache/incubator-tvm/pull/6393 helps


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] vathysjacob commented on issue #6376: [BUG] Memory leak in Metal runtime device api

Posted by GitBox <gi...@apache.org>.
vathysjacob commented on issue #6376:
URL: https://github.com/apache/incubator-tvm/issues/6376#issuecomment-686766012


   @tqchen - negative on #6393, memory usage continues to increase.
   
   It looks like if manual reference counting is used, setPurgeableState needs to be set to empty before MTLBuffer will be released.
   
   The following clears up a bulk of the retained memory allocations for me, I'll need to take a look more closely to see what else may be being retained. With this change, memory is now ticking up on the order of 10s of KB when reloading a model instead of the ~22MB I saw previously:
   
   ```
   void MetalWorkspace::FreeDataSpace(TVMContext ctx, void* ptr) {
     // release the ptr.
   + [(id<MTLBuffer>)ptr setPurgeableState:MTLPurgeableStateEmpty];
     CFRelease(ptr);
   }
   ```
   
   FYI - if it is relevant:
   - I'm building for iOS 13.6
   - Objective-C Automatic Reference Counting is set to No in the project build settings based on the settings used in the tvmrpc  project.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen commented on issue #6376: [BUG] Memory leak in Metal runtime device api

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #6376:
URL: https://github.com/apache/incubator-tvm/issues/6376#issuecomment-686783855


   Thanks @vathysjacob do you mind send a PR by adding the `setPurgeableState:MTLPurgeableStateEmpty`?
   
   It would also be interesting to ask if we should turn on the ARC, and what is the difference


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org