You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/03/18 18:09:25 UTC

[GitHub] [incubator-mxnet] ycnie opened a new pull request #20043: [WIP]C API Enhancement

ycnie opened a new pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043


   ## Description ##
   C api enhancement.
   
   ## Checklist ##
   ### Essentials ###
   - [ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   - [ ] Code is well-documented
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be made.
   - Interesting edge cases to note here
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20043: [WIP]C API Enhancement

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#issuecomment-802173879


   Hey @ycnie , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [unix-cpu, windows-cpu, sanity, clang, edge, unix-gpu, miscellaneous, centos-gpu, centos-cpu, website, windows-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] access2rohit commented on a change in pull request #20043: C API Enhancement

Posted by GitBox <gi...@apache.org>.
access2rohit commented on a change in pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#discussion_r629575202



##########
File path: include/mxnet/c_predict_api.h
##########
@@ -56,6 +56,16 @@ typedef void (*PredMonitorCallback)(const char*,
                                     NDArrayHandle,
                                     void*);
 
+/*! \brief enum of NDArray dtypes */
+enum DType {

Review comment:
       In `c_predict_api.h` file we are using it like this `int* input_dtypes` correct? If yes, then do you think the files through which you will call the functions defined here will not have access to `base.h` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] access2rohit commented on a change in pull request #20043: C API Enhancement

Posted by GitBox <gi...@apache.org>.
access2rohit commented on a change in pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#discussion_r629575202



##########
File path: include/mxnet/c_predict_api.h
##########
@@ -56,6 +56,16 @@ typedef void (*PredMonitorCallback)(const char*,
                                     NDArrayHandle,
                                     void*);
 
+/*! \brief enum of NDArray dtypes */
+enum DType {

Review comment:
       In `c_predict_api.h` file we are using it like this `int* input_dtypes` correct? Do you think the files through which you will call the functions defined here will not have access to `base.h` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #20043: [WIP]C API Enhancement

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#discussion_r613502383



##########
File path: src/storage/pooled_storage_manager.h
##########
@@ -69,6 +70,11 @@ class GPUPooledStorageManager final : public StorageManager {
       LOG(FATAL) << "MXNET_GPU_MEM_POOL_PAGE_SIZE cannot be set to a value smaller than " << NDEV \
                  << ". Got " << page_size_ << ".";
     }
+    memory_limit_percentage_ = dmlc::GetEnv<double>("MXNET_GPU_MEM_LIMIT", 100.0);
+    if (memory_limit_percentage_ <= 0 || memory_limit_percentage_ > 100) {
+      LOG(FATAL) << "Invalid memory limit percentage given: " << memory_limit_percentage_
+                 << std::endl;
+    }

Review comment:
       How is this variable semantically different from the `MXNET_GPU_MEM_POOL_RESERVE` variable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #20043: [WIP]C API Enhancement

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#issuecomment-819749965


   I find some changes in this PR confusing. It would be great to have description on what use cases this PR aims to support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky edited a comment on pull request #20043: [WIP]C API Enhancement

Posted by GitBox <gi...@apache.org>.
samskalicky edited a comment on pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#issuecomment-819753228


   > I find some changes in this PR confusing. It would be great to have description on what use cases this PR aims to support.
   
   Still working on the description :-D
   
   Once we finish getting the code in order will update the description and remove WIP and ask for feedback. Thanks for keeping an eye on this PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #20043: C API Enhancement

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#issuecomment-833111127


   > Thanks for the contribution! Some of the changes will be useful for 2.0 too. One general question I have is whether there's opportunity to clean up the current c predict APIs in 1.x so that we can offer a clean version for 2.0 too. If you are open to it, it would be great to see a proposal on how a clean c predict API could look like given the opportunity to break backward compatibility.
   
   While helping with this PR, I noticed a lot of similarity in the TVM/DLR APIs:
   https://github.com/neo-ai/neo-ai-dlr/blob/main/include/dlr.h
   or
   https://github.com/neo-ai/neo-ai-dlr/blob/main/include/dlr_tvm.h
   
   Maybe we can use those as a reference as well in this exercise. 
   
   Cleaning up the C Predict API is a great idea, at the same time we should think about other C/C++ APIs like the MXGetBytesInUse or MXLoadLib that we want to include in our C interface moving forward. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #20043: [WIP]C API Enhancement

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#discussion_r613507427



##########
File path: src/storage/pooled_storage_manager.h
##########
@@ -148,9 +157,21 @@ void GPUPooledStorageManager::Alloc(Storage::Handle* handle) {
     mxnet::common::cuda::DeviceStore device_store(handle->ctx.real_dev_id(), true);
     size_t free, total;
     cudaMemGetInfo(&free, &total);
-    if (free <= total * reserve_ / 100 || size > free - total * reserve_ / 100)
+    double mem_limit_in_bytes = total * memory_limit_percentage_ / 100.0;
+    free = mem_limit_in_bytes - used_memory_;
+    if (free <= mem_limit_in_bytes * reserve_ / 100 ||
+        size > free - mem_limit_in_bytes * reserve_ / 100)
       ReleaseAll();
 
+    if (used_memory_ + size > mem_limit_in_bytes) {
+      // This calls abort() unless
+      // DMLC_LOG_FATAL_THROW != 0, then it
+      // throws std::runtime_error()
+      LOG(FATAL) << "memory limit reached, used: " << used_memory_ << "  limit: "
+                 << mem_limit_in_bytes << " (" << memory_limit_percentage_ << "% of "
+                 << total << ")";
+    }

Review comment:
       This is the enforcement of the user-specified GPU memory limit. If users say "MXNet can use 1GB", then it needs to someone let users know that it cant function properly anymore within the set memory limit.
   
   Its not any different than running on a GPU with small amount of GPU memory than needed (artificial limit).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] access2rohit commented on a change in pull request #20043: C API Enhancement

Posted by GitBox <gi...@apache.org>.
access2rohit commented on a change in pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#discussion_r627820528



##########
File path: include/mxnet/c_predict_api.h
##########
@@ -56,6 +56,16 @@ typedef void (*PredMonitorCallback)(const char*,
                                     NDArrayHandle,
                                     void*);
 
+/*! \brief enum of NDArray dtypes */
+enum DType {

Review comment:
       can we reuse this? https://github.com/apache/incubator-mxnet/blob/master/3rdparty/mshadow/mshadow/base.h#L352
   Keeps the code clean by avoiding multiple copies of different Enums




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on pull request #20043: [WIP]C API Enhancement

Posted by GitBox <gi...@apache.org>.
samskalicky commented on pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#issuecomment-819753228


   > I find some changes in this PR confusing. It would be great to have description on what use cases this PR aims to support.
   
   Still working on the description :-D
   
   Once we finish getting the code in order will update the description and remove WIP and ask for feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] access2rohit commented on a change in pull request #20043: C API Enhancement

Posted by GitBox <gi...@apache.org>.
access2rohit commented on a change in pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#discussion_r630535450



##########
File path: src/storage/pooled_storage_manager.h
##########
@@ -57,18 +58,22 @@ class GPUPooledStorageManager final : public StorageManager {
    * \param initial_context context used by this Storage Manager
    */
   explicit GPUPooledStorageManager(Context initial_context) :
-    initial_context_(initial_context) {
-    reserve_ = dmlc::GetEnv("MXNET_GPU_MEM_POOL_RESERVE", 5);
+  free_list_size_(0), initial_context_(initial_context) {

Review comment:
       nit: indent ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #20043: [WIP]C API Enhancement

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#issuecomment-821326307


   cc @andrei5055 for storage manager changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] access2rohit commented on a change in pull request #20043: C API Enhancement

Posted by GitBox <gi...@apache.org>.
access2rohit commented on a change in pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#discussion_r630533696



##########
File path: include/mxnet/c_predict_api.h
##########
@@ -56,6 +56,16 @@ typedef void (*PredMonitorCallback)(const char*,
                                     NDArrayHandle,
                                     void*);
 
+/*! \brief enum of NDArray dtypes */
+enum DType {

Review comment:
       Discussed offline: This header is needed by end user to understand which DTypes are supported. Looks good 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on a change in pull request #20043: [WIP]C API Enhancement

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#discussion_r613502931



##########
File path: src/storage/pooled_storage_manager.h
##########
@@ -148,9 +157,21 @@ void GPUPooledStorageManager::Alloc(Storage::Handle* handle) {
     mxnet::common::cuda::DeviceStore device_store(handle->ctx.real_dev_id(), true);
     size_t free, total;
     cudaMemGetInfo(&free, &total);
-    if (free <= total * reserve_ / 100 || size > free - total * reserve_ / 100)
+    double mem_limit_in_bytes = total * memory_limit_percentage_ / 100.0;
+    free = mem_limit_in_bytes - used_memory_;
+    if (free <= mem_limit_in_bytes * reserve_ / 100 ||
+        size > free - mem_limit_in_bytes * reserve_ / 100)
       ReleaseAll();
 
+    if (used_memory_ + size > mem_limit_in_bytes) {
+      // This calls abort() unless
+      // DMLC_LOG_FATAL_THROW != 0, then it
+      // throws std::runtime_error()
+      LOG(FATAL) << "memory limit reached, used: " << used_memory_ << "  limit: "
+                 << mem_limit_in_bytes << " (" << memory_limit_percentage_ << "% of "
+                 << total << ")";
+    }

Review comment:
       Why?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] access2rohit edited a comment on pull request #20043: C API Enhancement

Posted by GitBox <gi...@apache.org>.
access2rohit edited a comment on pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#issuecomment-839166973


   @ycnie Can you also show how did you test that it works ? Like a proper script with commands to build with libmxnet.so and run, in order to check if one can limit memory usage and verify that it works as intended? I know this is non-reachable code by usual means(via python,C++,R,java etc. APIs) but adding a similar test to CI would be best way to go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #20043: C API Enhancement

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#discussion_r627833203



##########
File path: include/mxnet/c_predict_api.h
##########
@@ -56,6 +56,16 @@ typedef void (*PredMonitorCallback)(const char*,
                                     NDArrayHandle,
                                     void*);
 
+/*! \brief enum of NDArray dtypes */
+enum DType {

Review comment:
       no, this is part of the c_predict_api.h header file that does not import any other MX includes. its standalone.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #20043: C API Enhancement

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#issuecomment-833062091


   Thanks for the contribution! Some of the changes will be useful for 2.0 too. One general question I have is whether there's opportunity to clean up the current c predict APIs in 1.x so that we can offer a clean version for 2.0 too. If you are open to it, it would be great to see a proposal on how a clean c predict API could look like given the opportunity to break backward compatibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #20043: [WIP]C API Enhancement

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#discussion_r613506332



##########
File path: src/storage/pooled_storage_manager.h
##########
@@ -69,6 +70,11 @@ class GPUPooledStorageManager final : public StorageManager {
       LOG(FATAL) << "MXNET_GPU_MEM_POOL_PAGE_SIZE cannot be set to a value smaller than " << NDEV \
                  << ". Got " << page_size_ << ".";
     }
+    memory_limit_percentage_ = dmlc::GetEnv<double>("MXNET_GPU_MEM_LIMIT", 100.0);
+    if (memory_limit_percentage_ <= 0 || memory_limit_percentage_ > 100) {
+      LOG(FATAL) << "Invalid memory limit percentage given: " << memory_limit_percentage_
+                 << std::endl;
+    }

Review comment:
       according to https://mxnet-bing.readthedocs.io/en/latest/how_to/env_var.html
   
   > MXNET_GPU_MEM_POOL_RESERVE (default=5) is The percentage of GPU memory to reserve for things other than the GPU array, such as kernel launch or cudnn handle space.
   
   So its more like the inverse. Instead of saying how much you want to "leave for other things", `MXNET_GPU_MEM_LIMIT` is for how much memory you want to "give" MXNet in particular. 
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] access2rohit commented on pull request #20043: C API Enhancement

Posted by GitBox <gi...@apache.org>.
access2rohit commented on pull request #20043:
URL: https://github.com/apache/incubator-mxnet/pull/20043#issuecomment-839166973


   @ycnie Can you also show how did you test that it works ? Like a proper script with commands to build with libmxnet.so and run, in order to check if one can limit memory usage and verify that it works as intended?
   I think adding a similar test to CI would be best way to go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org