You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by "AndrewZhaoLuo (via GitHub)" <gi...@apache.org> on 2023/01/31 00:45:16 UTC

[GitHub] [tvm] AndrewZhaoLuo opened a new pull request, #13877: [VM] Lower memory usage when loading and dumping weights

AndrewZhaoLuo opened a new pull request, #13877:
URL: https://github.com/apache/tvm/pull/13877

Right now there is a bad pattern in VM executable where when loading weights, we load serialized representation in memory, and then deserialize off the in-memory store without progressively freeing memory.

This is bad because if our weights take up ~ 5GB, then the serialized representation in memory takes up 5GB and the deserialized representation will take ~ 5 GB too. This means peak memory use for using the VM for execution is 2 * the size of the weight models.

This is bad, especially with some of the larger models out there today.

This fixes thing by using a stream from disk, and depending on the standard C file interface to buffer things for performant results.

Some before and after graphs though loading and benchmarking a model with ~5GB weights:

Before:

![image](https://user-images.githubusercontent.com/13855451/215629180-4d07e0b4-cb6e-4535-88ce-f8b4346f8698.png)

After:

![image](https://user-images.githubusercontent.com/13855451/215629115-a6ac9f3a-98e4-4d37-a7a3-fb9a6d26a3c3.png)

This is a draft since:
- I've only tested loading weights, but we can see similar savings in other similar streams.
- We need to make a decision on DMLC stream interface. The main issue is that a lot of existing code depends on DMLC stream interface, but DMLC itself is a header only library. We only have access to in-memory streams in the current state. The way I have gotten around this is by implementing a simple class.
- We need to decide best way forward. The one in this PR is simple, though technically duplicates some code from DMLC core lib
- Alternatives are including DMLC as dependency, adding to DMLC functionality and pulling those things changes, or get rid of DMLC stream interface entirely
- This one is the simplest which is why I will do this for the draft.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tvm-bot commented on pull request #13877: [VM] Lower memory usage when loading and dumping weights

Posted by "tvm-bot (via GitHub)" <gi...@apache.org>.

tvm-bot commented on PR #13877:
URL: https://github.com/apache/tvm/pull/13877#issuecomment-1409580474

   <!---bot-comment-->
   
   Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @-ing them in a comment.
   
   
   
   <sub>Generated by [tvm-bot](https://github.com/apache/tvm/blob/main/ci/README.md#github-actions)</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo commented on pull request #13877: [VM][DMLC] Lower memory usage when loading and dumping weights

Posted by "AndrewZhaoLuo (via GitHub)" <gi...@apache.org>.

AndrewZhaoLuo commented on PR #13877:
URL: https://github.com/apache/tvm/pull/13877#issuecomment-1409580814

   cc @tqchen @jwfromm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo commented on a diff in pull request #13877: [VM][DMLC] Lower memory usage when loading and dumping weights

Posted by "AndrewZhaoLuo (via GitHub)" <gi...@apache.org>.

AndrewZhaoLuo commented on code in PR #13877:
URL: https://github.com/apache/tvm/pull/13877#discussion_r1093698363


##########
include/tvm/runtime/dmlc_file_stream.h:
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on pull request #13877: [VM][DMLC] Lower memory usage when loading and dumping weights

Posted by "tqchen (via GitHub)" <gi...@apache.org>.

tqchen commented on PR #13877:
URL: https://github.com/apache/tvm/pull/13877#issuecomment-1411185588

   The approach of having overload file support util is fine, one thing is that it would needs to be part of the runtime folder as it is simple enough. 
   
   Given most of the cases are on GPU, having ability to be able to load one array into CPU, copy into GPU then immediately free that CPU array can also be effective.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo merged pull request #13877: [VM][DMLC] Lower memory usage when loading and dumping weights

Posted by "AndrewZhaoLuo (via GitHub)" <gi...@apache.org>.

AndrewZhaoLuo merged PR #13877:
URL: https://github.com/apache/tvm/pull/13877


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] AndrewZhaoLuo commented on pull request #13877: [VM][DMLC] Lower memory usage when loading and dumping weights

Posted by "AndrewZhaoLuo (via GitHub)" <gi...@apache.org>.

AndrewZhaoLuo commented on PR #13877:
URL: https://github.com/apache/tvm/pull/13877#issuecomment-1412544114

   @tqchen thanks for the comments. 
   
   PTAL, ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] tqchen commented on a diff in pull request #13877: [VM][DMLC] Lower memory usage when loading and dumping weights

Posted by "tqchen (via GitHub)" <gi...@apache.org>.

tqchen commented on code in PR #13877:
URL: https://github.com/apache/tvm/pull/13877#discussion_r1093661479


##########
include/tvm/runtime/dmlc_file_stream.h:
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *

Review Comment:
   can we keep it as part of https://github.com/apache/tvm/blob/main/src/runtime/file_utils.h assuming it is not something that is going to be touched by the public API



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org