You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/05/18 02:53:13 UTC
[GitHub] [incubator-mxnet] larroy commented on a change in pull request #14973: [MXNET-1404] Added the GPU memory profiler

larroy commented on a change in pull request #14973: [MXNET-1404] Added the GPU memory profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#discussion_r285326209
 
 

 ##########
 File path: example/gpu_memory_profiler/README.md
 ##########
 @@ -0,0 +1,151 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# GPU Memory Profiler
+
+
+## Motivation
+
+Machine learning training tasks running on the GPUs are very frequently limited 
+  by the GPU memory capacity, frontend programmers therefore need a GPU memory 
+  profiler to understand where the memory goes.
+The problems with the existing GPU memory profilers are that they are too high-level 
+  (similar to `nvidia-smi` that only provides a sum of all the consumptions) 
+  and hence make it challenging for the frontend programmers to get the big picture.
+
+In this example, we provide instructions on how to use the **MXNet GPU memory profiler**.
+We start by making the observation that most MXNet programmers have
+  a very good habit of assigning names to computation symbols:
+
+```Python
+# Symbolic Graph Implementation of an LSTM Cell
+i2h = symbol.FullyConnected(data=inputs, weight=self._iW, bias=self._iB,
+                            num_hidden=self._num_hidden*4,
+                            name='%si2h'%name)
+h2h = symbol.FullyConnected(data=states[0], weight=self._hW, bias=self._hB,
+                            num_hidden=self._num_hidden*4,
+                            name='%sh2h'%name)
+```
+
+Those names contain rich information regarding the model that is being described 
+  and should be leveraged during the GPU memory profiling phase.
+Luckily, MXNet already has the ability of propagating this information 
+  all the way down from the Python APIs to its C++ core `nnvm::Graph`.
+When the graph executor tries to initialize its data entries, we extract such information 
+  out from the computation graph to tag the data entries (of type `mxnet::NDArray`),
+  and when those data entries are materialized we are able to propagate those names
+  to the storage allocators and record them inside the logging files,
+  which could be further used for visualization and/or analyzing purposes.
+
+![MXNet-GPU_Memory_Profiler-Design](./MXNet-GPU_Memory_Profiler-Design.png)
+
+
+## Instructions
+
+(*The video below is shown at **SysML 2019** ([Demo# 24](https://www.sysml.cc/doc/2019/demo_24.pdf))*.)
+
+![MXNet-GPU_Memory_Profiler](./sysml19_demo/MXNet-GPU_Memory_Profiler.gif)
+
+*In order for the GPU memory profiler to be enabled, you need to **compile from source**.*
+
+- Download the MXNet codebase and install the prerequisite software libraries 
+    (e.g., *OpenCV*, *jemalloc*, etc.).
+
+```bash
+git clone --recursive https://github.com/apache/incubator-mxnet mxnet
+sudo apt-get install ...
+```
+
+- Modify the `MXNET_ENABLE_STORAGE_TAGGING` flag in `include/storage_tag.h` to **1**,
+    which controls the storage tagging (disabled by default),
+    **and** `MXNET_USE_GPU_MEMORY_PROFILER` flag in `src/profiler/gpu_memory_profiler.h` **1**,
+    which controls dumping the GPU memory allocation inforamtion (also disabled by default).
+  Please note that you must have **BOTH flags** set to **1** to use the GPU memory profiler.
+
+- Build the MXNet core library.
+
+```bash
+cd mxnet # workspace/mxnet
 
 Review comment:
   Can we refer to the install instructions? These will quickly become obsolete. For example now it indicates make vs CMake, maybe is better to just say follow the build linked here and add this build flag...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services