You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/05/21 01:27:55 UTC
[GitHub] [incubator-mxnet] larroy commented on a change in pull request #14977: Add an utility for operator benchmarks

larroy commented on a change in pull request #14977: Add an utility for operator benchmarks
URL: https://github.com/apache/incubator-mxnet/pull/14977#discussion_r285821802
 
 

 ##########
 File path: benchmark/opperf/README.md
 ##########
 @@ -0,0 +1,174 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# MXNet Operator Performance Benchmarks
+
+A Python utility for benchmarking and profiling individual MXNet operator execution.
+
+With this utility, for each MXNet operator you can get the following details:
+
+**Timing**
+1. Forward execution time
+2. Backward execution time
+3. Time spent for memory management
+
+**Memory**
+1. Total memory allocated
+
+# Motivation
+
+Benchmarks are usually done end-to-end for a given Network Architecture. For example: ResNet-50 benchmarks on ImageNet data. This is good measurement of overall performance and health of a deep learning framework. However, it is important to note the following important factors:
+1. Users use a lot more operators that are not part of a standard network like ResNet. Example: Tensor manipulation operators like mean, max, topk, argmax, sort etc.   
+2. A standard Network Architecture like ResNet-50 is made up of many operators Ex: Convolution2D, Softmax, Dense and more. Consider the following scenarios:
+    1. We improved the performance of Convolution2D operator, but due to a bug, Softmax performance went down. Overall, we may observe end to end benchmarks are running fine, we may miss out the performance degradation of a single operator which can accumulate and become untraceable.
+    2. You need to see in a given network, which operator is taking maximum time and plan optimization work. With end to end benchmarks, it is hard to get more fine grained numbers at operator level.
+3. We need to know on different hardware infrastructure (Ex: CPU with MKLDNN, GPU with NVIDIA CUDA and cuDNN) how different operators performs. With these details, we can plan the optimization work at operator level, which could exponentially boost up end to end performance.
+4. You want to have nightly performance tests across all operators in a deep learning framework to catch regressions early. 
+5. We can integrate this framework with a CI/CD system to run per operator performance tests for PRs. Example: When a PR modifies the kernel of TransposeConv2D, we can run benchmarks of TransposeConv2D operator to verify performance.
+
+Hence, in this utility, we will build the functionality to allow users and developers of deep learning frameworks to easily run benchmarks for individual operators.
+
+# How to use
+
+## Prerequisites
+
+This utility uses MXNet profiler under the hood to fetch compute and memory metrics. Hence, you need to build MXNet with `USE_PROFILER=1` flag.
+
+Make sure to build the flavor of MXNet, for example - with/without MKL, with CUDA 9 or 10.1 etc., on which you would like to measure operator performance.
+
+## Usecase 1 - Run benchmarks for all the operators
+
+Below command runs all the MXNet operators (NDArray) benchmarks with default inputs and saves the final result as JSON in the given file.
+
+```
+python incubator-mxnet/benchmark/opperf/opperf.py --output-format json --output-file mxnet_operator_benchmark_results.json
+```
+
+**Other Supported Options:**
+
+1. **output-format** : `json` or `md` for markdown file output or csv.
+
+2. **ctx** : `cpu` or `gpu`. By default, cpu on CPU machine, gpu(0) on GPU machine. You can override and set the global context for all operator benchmarks. Example: --ctx gpu(2).
+
+3. **dtype** : By default, `float32`. You can override and set the global dtype for all operator benchmarks. Example: --dtype float64.
+
+## Usecase 2 - Run benchmarks for all the operators in a specific category
+
+For example, you want to run benchmarks for all NDArray Arithmetic Operators, you just run the following python script.
+
+```
+#! /usr/bin/python
+from benchmark.opperf.tensor_operations.arithmetic_operations import run_arithmetic_operators_benchmarks
+
+# Run all Arithmetic operations benchmarks with default input values
+print(run_arithmetic_operators_benchmarks())
+```
+
+Output for the above benchmark run, on a CPU machine, would look something like below:
+
+```
+{'subtract': [{'avg_time_forward_broadcast_sub': 5.5137, 
+               'avg_time_mem_alloc_cpu/0': 207618.0469,
+               'avg_time_backward_broadcast_sub': 7.2976, 
+               'inputs': {'lhs': (1024, 1024), 'rhs': (1024, 1024)}}
+             ],
+ 'add': [{'avg_time_mem_alloc_cpu/0': 207618.0469,
+          'avg_time_forward_broadcast_add': 4.309,
+          'avg_time_backward_broadcast_add': 5.6063,
+          'inputs': {'lhs': (1024, 1024), 'rhs': (1024, 1024)}},
+        ],
+ 'multiply': [{'avg_time_backward_broadcast_mul': 19.1712,
+               'avg_time_mem_alloc_cpu/0': 207618.0469,
+               'avg_time_forward_broadcast_mul': 6.4855, 
+               'inputs': {'lhs': (1024, 1024), 'rhs': (1024, 1024)}},
+             ]
+}
+```
+
+## Usecase 3 - Run benchmarks for specific operator
+For example, you want to run benchmarks for `nd.add` operator in MXNet, you just run the following python script.
+
+```
+#! /usr/bin/python
 
 Review comment:
   extra space on shebang?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services