You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/01/27 18:44:31 UTC

[GitHub] [incubator-mxnet] connorgoggins opened a new pull request #17449: Implemented large tensor flag for opperf testing

connorgoggins opened a new pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449
 
 
   ## Description ##
   Added a flag (`large-tensor`) and relevant default data which allows users to run the entire suite of opperf tests with large tensor data after they build MXNet with large tensor support. Please note that several ops still have kernel-level issues with large tensor data, so tests involving these operators may throw errors.
   
   ## Checklist ##
   ### Essentials ###
   Please feel free to remove inapplicable items for your PR.
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage
   - [x] Code is well-documented
   - [x] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - M benchmark/opperf/nd_operations/array_rearrange.py
   - M benchmark/opperf/nd_operations/binary_operators.py
   - M benchmark/opperf/nd_operations/gemm_operators.py
   - M benchmark/opperf/nd_operations/nn_activation_operators.py
   - M benchmark/opperf/nd_operations/nn_basic_operators.py
   - M benchmark/opperf/nd_operations/nn_conv_operators.py
   - M benchmark/opperf/nd_operations/nn_optimizer_operators.py
   - M benchmark/opperf/nd_operations/random_sampling_operators.py
   - M benchmark/opperf/nd_operations/reduction_operators.py
   - M benchmark/opperf/nd_operations/sorting_searching_operators.py
   - M benchmark/opperf/nd_operations/unary_operators.py
   - M benchmark/opperf/opperf.py
   - M benchmark/opperf/rules/default_params.py
   - M benchmark/opperf/utils/benchmark_utils.py
   - M benchmark/opperf/utils/op_registry_utils.py
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] connorgoggins commented on issue #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

connorgoggins commented on issue #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-580390884
 
 
   With this flag, users could effectively avoid having to create their own custom inputs for each operator, potentially saving them a significant amount of time and effort if they are testing multiple ops. The flag wouldn't be particularly useful if the customer has a specific input tensor shape in mind, but there must also be cases when customers want a quick way of obtaining a more general outlook on the performance of operators under large tensor conditions (e.g. for evaluating op performance differences across different machines and different input sizes).
   
   Would changing the name to `int64_tensor` introduce more clarity?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r371419201
 
 

 ##########
 File path: benchmark/opperf/nd_operations/gemm_operators.py
 ##########
 @@ -55,33 +55,64 @@ def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nativ
 
     """
     # Benchmark tests for dot and batch_dot operators
-    dot_benchmark_res = run_performance_test(
-        [getattr(MX_OP_MODULE, "dot")], run_backward=True,
-        dtype=dtype, ctx=ctx,
-        inputs=[{"lhs": (1024, 1024),
-                 "rhs": (1024, 1024)},
-                {"lhs": (1000, 10),
-                 "rhs": (1000, 10),
-                 "transpose_b": True},
-                {"lhs": (1000, 1),
-                 "rhs": (100, 1000),
-                 "transpose_a": True,
-                 "transpose_b": True}],
-        warmup=warmup, runs=runs, profiler=profiler)
+    if large_tensor == "on":
+        print("dot")
 
 Review comment:
   Generally it is discouraged to put print statements

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya edited a comment on issue #17449: [Large Tensor] Implemented LT flag for OpPerf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya edited a comment on issue #17449: [Large Tensor] Implemented LT flag for OpPerf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-593053525
 
 
   While full opperf suite was run initially (and has been linked in the description)
   was full opperf run after the subsequent commits? like new ops added and merges?
   
   Could you paste opperf results after commit 56ad70
   
   Coz right now, with master (cuda, cudnn ON)
   full opperf suite runs into error for lamb_update_phase1
   ```
   Traceback (most recent call last):
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 213, in <module>
       sys.exit(main())
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 193, in main
       benchmark_results = run_all_mxnet_operator_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs)
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 111, in run_all_mxnet_operator_benchmarks
       mxnet_operator_benchmark_results.append(run_optimizer_operators_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs))
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/nd_operations/nn_optimizer_operators.py", line 142, in run_optimizer_operators_benchmarks
       mx_optimizer_op_results = run_op_benchmarks(mx_optimizer_ops, dtype, ctx, profiler, int64_tensor, warmup, runs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 210, in run_op_benchmarks
       warmup=warmup, runs=runs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 177, in run_performance_test
       benchmark_result = _run_nd_operator_performance_test(op, inputs, run_backward, warmup, runs, kwargs_list, profiler)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 114, in _run_nd_operator_performance_test
       _, _ = benchmark_helper_func(op, warmup, **kwargs_list[0])
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/profiler_utils.py", line 200, in cpp_profile_it
       res = func(*args, **kwargs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/ndarray_utils.py", line 97, in nd_forward_and_profile
       res = op(**kwargs_new)
     File "<string>", line 113, in lamb_update_phase1
     File "/home/ubuntu/incubator-mxnet/python/mxnet/_ctypes/ndarray.py", line 91, in _imperative_invoke
       ctypes.byref(out_stypes)))
     File "/home/ubuntu/incubator-mxnet/python/mxnet/base.py", line 246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: MXNetError: Required parameter wd of float is not presented, in operator lamb_update_phase1(name="", t="1", rescale_grad="0.4", epsilon="1e-08", beta2="0.1", beta1="0.1")
   *** Error in `python': corrupted double-linked list: 0x000055b58a93f6c0 ***
   ```
   
   The PR which introduced lamb_update_phase1 to opperf https://github.com/apache/incubator-mxnet/pull/17542 worked for CUDA CUDNN ON
   but now it doesn't.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373705920
 
 

 ##########
 File path: benchmark/opperf/nd_operations/binary_operators.py
 ##########
 @@ -75,6 +77,8 @@ def run_mx_binary_element_wise_operators_benchmarks(ctx=mx.cpu(), dtype='float32
         Context to run benchmarks
     dtype: str, default 'float32'
         Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
 
 Review comment:
   Same here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] connorgoggins commented on issue #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

connorgoggins commented on issue #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-579410209
 
 
   @apeforest thanks for your feedback! The purpose of this flag would not only be to test operator functionality on large tensor data, but also to test the actual performance of each operator on large tensor data (which falls within the mission of opperf). With this in mind, I believe it makes sense to add this as a parameter to the utility.
   
   This would be valuable to users who are interested in debugging their models' performance at the operator level on large tensor data, thereby helping users create more efficient models when handling high-dimensional data.
   
   I can refactor this into a general `run_large_tensor_test` function if you would prefer, but I think users may sometimes want to test specific ops and categories of ops on large tensor data instead of being forced to test all ops at the same time.
   
   If the consensus is that this would be better as a private branch, I can move in that direction instead.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373707500
 
 

 ##########
 File path: benchmark/opperf/nd_operations/nn_conv_operators.py
 ##########
 @@ -60,131 +81,286 @@ def run_pooling_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='na
     pool2d_benchmark_res = []
     for pool_type in pool_types:
         for global_pool in global_pool_types:
-            for pool1d_data in [(32, 3, 256), (32, 3, 64)]:
-                pool1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
-                                                             run_backward=True,
-                                                             dtype=dtype,
-                                                             ctx=ctx,
-                                                             profiler=profiler,
-                                                             inputs=[{"data": pool1d_data,
-                                                                      "kernel": 3,
-                                                                      "pool_type": pool_type,
-                                                                      "global_pool": global_pool,
-                                                                      "stride": 1,
-                                                                      "pad": 1}
-                                                                     ],
-                                                             warmup=warmup,
-                                                             runs=runs)
-            for pool2d_data in [(32, 3, 256, 256), (32, 3, 64, 64)]:
-                pool2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
-                                                             run_backward=True,
-                                                             dtype=dtype,
-                                                             ctx=ctx,
-                                                             profiler=profiler,
-                                                             inputs=[{"data": pool2d_data,
-                                                                      "kernel": (3, 3),
-                                                                      "pool_type": pool_type,
-                                                                      "global_pool": global_pool,
-                                                                      "stride": (1, 1),
-                                                                      "pad": (0, 0)}
-                                                                     ],
-                                                             warmup=warmup,
-                                                             runs=runs)
+            if large_tensor == 'on':
+                for pool1d_data in [(1, 1, 2**32), (2**31, 1, 3)]:
+                    pool1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
+                                                                 run_backward=True,
+                                                                 dtype=dtype,
+                                                                 ctx=ctx,
+                                                                 profiler=profiler,
+                                                                 inputs=[{"data": pool1d_data,
+                                                                          "kernel": 3,
+                                                                          "pool_type": pool_type,
+                                                                          "global_pool": global_pool,
+                                                                          "stride": 1,
+                                                                          "pad": 1}
+                                                                        ],
+                                                                 warmup=warmup,
+                                                                 runs=runs)
+                for pool2d_data in [(2**29, 1, 3, 3), (2**28, 1, 4, 4)]:
+                    pool2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
+                                                                 run_backward=True,
+                                                                 dtype=dtype,
+                                                                 ctx=ctx,
+                                                                 profiler=profiler,
+                                                                 inputs=[{"data": pool2d_data,
+                                                                          "kernel": (3, 3),
+                                                                          "pool_type": pool_type,
+                                                                          "global_pool": global_pool,
+                                                                          "stride": (1, 1),
+                                                                          "pad": (0, 0)}
+                                                                        ],
+                                                                 warmup=warmup,
+                                                                 runs=runs)
+            else:
+                for pool1d_data in [(32, 3, 256), (32, 3, 64)]:
+                    pool1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
+                                                                 run_backward=True,
+                                                                 dtype=dtype,
+                                                                 ctx=ctx,
+                                                                 profiler=profiler,
+                                                                 inputs=[{"data": pool1d_data,
+                                                                          "kernel": 3,
+                                                                          "pool_type": pool_type,
+                                                                          "global_pool": global_pool,
+                                                                          "stride": 1,
+                                                                          "pad": 1}
+                                                                        ],
+                                                                 warmup=warmup,
+                                                                 runs=runs)
+                for pool2d_data in [(32, 3, 256, 256), (32, 3, 64, 64)]:
+                    pool2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
+                                                                 run_backward=True,
+                                                                 dtype=dtype,
+                                                                 ctx=ctx,
+                                                                 profiler=profiler,
+                                                                 inputs=[{"data": pool2d_data,
+                                                                          "kernel": (3, 3),
+                                                                          "pool_type": pool_type,
+                                                                          "global_pool": global_pool,
+                                                                          "stride": (1, 1),
+                                                                          "pad": (0, 0)}
+                                                                        ],
+                                                                 warmup=warmup,
+                                                                 runs=runs)
     # Prepare combined results
     mx_pooling_op_results = merge_map_list(pool1d_benchmark_res + pool2d_benchmark_res)
     return mx_pooling_op_results
 
 
-def run_convolution_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', warmup=25, runs=100):
-    # Conv1D Benchmarks
+def run_convolution_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', large_tensor='off', warmup=25, runs=100):
+    """Runs benchmarks with the given context, precision (dtype), and input data size (large_tensor) for all the convolution
+    operators in MXNet.
+
+    Parameters
+    ----------
+    ctx: mx.ctx
+        Context to run benchmarks
+    dtype: str, default 'float32'
+        Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
+    warmup: int, default 25
+        Number of times to run for warmup
+    runs: int, default 100
+        Number of runs to capture benchmark results
+
+    Returns
+    -------
+    Dictionary of results. Key -> Name of the operator, Value -> Benchmark results.
+
+    """
     conv1d_benchmark_res = []
-    for conv_data in [(32, 3, 256), (32, 3, 64)]:
-        conv1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Convolution")],
-                                                     run_backward=True,
-                                                     dtype=dtype,
-                                                     ctx=ctx,
-                                                     profiler=profiler,
-                                                     inputs=[{"data": conv_data,
-                                                              "weight": (64, 3, 3),
-                                                              "bias": (64,),
-                                                              "kernel": (3,),
-                                                              "stride": (1,),
-                                                              "dilate": (1,),
-                                                              "pad": (0,),
-                                                              "num_filter": 64,
-                                                              "layout": 'NCW'}
-                                                             ],
-                                                     warmup=warmup,
-                                                     runs=runs)
-    # Conv2D Benchmarks
     conv2d_benchmark_res = []
-    for conv_data in [(32, 3, 256, 256), (32, 3, 64, 64)]:
-        conv2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Convolution")],
-                                                     run_backward=True,
-                                                     dtype=dtype,
-                                                     ctx=ctx,
-                                                     profiler=profiler,
-                                                     inputs=[{"data": conv_data,
-                                                              "weight": (64, 3, 3, 3),
-                                                              "bias": (64,),
-                                                              "kernel": (3, 3),
-                                                              "stride": (1, 1),
-                                                              "dilate": (1, 1),
-                                                              "pad": (0, 0),
-                                                              "num_filter": 64,
-                                                              "layout": 'NCHW'}
-                                                             ],
-                                                     warmup=warmup,
-                                                     runs=runs)
+    if large_tensor == 'on':
+        # Conv1D Benchmarks
+        for conv_data in [(2**30, 1, 4), (2**31, 1, 3)]:
+            conv1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Convolution")],
+                                                         run_backward=True,
+                                                         dtype=dtype,
+                                                         ctx=ctx,
+                                                         profiler=profiler,
+                                                         inputs=[{"data": conv_data,
+                                                                  "weight": (1, 1, 3),
+                                                                  "bias": (1,),
+                                                                  "kernel": (3,),
+                                                                  "stride": (1,),
+                                                                  "dilate": (1,),
+                                                                  "pad": (0,),
+                                                                  "num_filter": 1,
+                                                                  "layout": 'NCW'}
+                                                                ],
+                                                         warmup=warmup,
+                                                         runs=runs)
+        # Conv2D Benchmarks
+        for conv_data in [(2**29, 1, 3, 3), (2**28, 1, 4, 4)]:
+            conv2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Convolution")],
+                                                         run_backward=True,
+                                                         dtype=dtype,
+                                                         ctx=ctx,
+                                                         profiler=profiler,
+                                                         inputs=[{"data": conv_data,
+                                                                  "weight": (1, 1, 3, 3),
+                                                                  "bias": (1,),
+                                                                  "kernel": (3, 3),
+                                                                  "stride": (1, 1),
+                                                                  "dilate": (1, 1),
+                                                                  "pad": (0, 0),
+                                                                  "num_filter": 1,
+                                                                  "layout": 'NCHW'}
+                                                                ],
+                                                         warmup=warmup,
+                                                         runs=runs)
+    else:
 
 Review comment:
   It seems the only difference between the if and else branch is the `inputs` argument. Can we only generate different inputs in the if/else branch and pass them to the same operator function?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373707739
 
 

 ##########
 File path: benchmark/opperf/nd_operations/sorting_searching_operators.py
 ##########
 @@ -39,6 +39,8 @@ def run_sorting_searching_operators_benchmarks(ctx=mx.cpu(), dtype='float32', pr
         Context to run benchmarks
     dtype: str, default 'float32'
         Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
 
 Review comment:
   Be more specific please.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373707535
 
 

 ##########
 File path: benchmark/opperf/nd_operations/nn_conv_operators.py
 ##########
 @@ -60,131 +81,286 @@ def run_pooling_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='na
     pool2d_benchmark_res = []
     for pool_type in pool_types:
         for global_pool in global_pool_types:
-            for pool1d_data in [(32, 3, 256), (32, 3, 64)]:
-                pool1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
-                                                             run_backward=True,
-                                                             dtype=dtype,
-                                                             ctx=ctx,
-                                                             profiler=profiler,
-                                                             inputs=[{"data": pool1d_data,
-                                                                      "kernel": 3,
-                                                                      "pool_type": pool_type,
-                                                                      "global_pool": global_pool,
-                                                                      "stride": 1,
-                                                                      "pad": 1}
-                                                                     ],
-                                                             warmup=warmup,
-                                                             runs=runs)
-            for pool2d_data in [(32, 3, 256, 256), (32, 3, 64, 64)]:
-                pool2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
-                                                             run_backward=True,
-                                                             dtype=dtype,
-                                                             ctx=ctx,
-                                                             profiler=profiler,
-                                                             inputs=[{"data": pool2d_data,
-                                                                      "kernel": (3, 3),
-                                                                      "pool_type": pool_type,
-                                                                      "global_pool": global_pool,
-                                                                      "stride": (1, 1),
-                                                                      "pad": (0, 0)}
-                                                                     ],
-                                                             warmup=warmup,
-                                                             runs=runs)
+            if large_tensor == 'on':
+                for pool1d_data in [(1, 1, 2**32), (2**31, 1, 3)]:
+                    pool1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
+                                                                 run_backward=True,
+                                                                 dtype=dtype,
+                                                                 ctx=ctx,
+                                                                 profiler=profiler,
+                                                                 inputs=[{"data": pool1d_data,
+                                                                          "kernel": 3,
+                                                                          "pool_type": pool_type,
+                                                                          "global_pool": global_pool,
+                                                                          "stride": 1,
+                                                                          "pad": 1}
+                                                                        ],
+                                                                 warmup=warmup,
+                                                                 runs=runs)
+                for pool2d_data in [(2**29, 1, 3, 3), (2**28, 1, 4, 4)]:
+                    pool2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
+                                                                 run_backward=True,
+                                                                 dtype=dtype,
+                                                                 ctx=ctx,
+                                                                 profiler=profiler,
+                                                                 inputs=[{"data": pool2d_data,
+                                                                          "kernel": (3, 3),
+                                                                          "pool_type": pool_type,
+                                                                          "global_pool": global_pool,
+                                                                          "stride": (1, 1),
+                                                                          "pad": (0, 0)}
+                                                                        ],
+                                                                 warmup=warmup,
+                                                                 runs=runs)
+            else:
+                for pool1d_data in [(32, 3, 256), (32, 3, 64)]:
+                    pool1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
+                                                                 run_backward=True,
+                                                                 dtype=dtype,
+                                                                 ctx=ctx,
+                                                                 profiler=profiler,
+                                                                 inputs=[{"data": pool1d_data,
+                                                                          "kernel": 3,
+                                                                          "pool_type": pool_type,
+                                                                          "global_pool": global_pool,
+                                                                          "stride": 1,
+                                                                          "pad": 1}
+                                                                        ],
+                                                                 warmup=warmup,
+                                                                 runs=runs)
+                for pool2d_data in [(32, 3, 256, 256), (32, 3, 64, 64)]:
+                    pool2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
+                                                                 run_backward=True,
+                                                                 dtype=dtype,
+                                                                 ctx=ctx,
+                                                                 profiler=profiler,
+                                                                 inputs=[{"data": pool2d_data,
+                                                                          "kernel": (3, 3),
+                                                                          "pool_type": pool_type,
+                                                                          "global_pool": global_pool,
+                                                                          "stride": (1, 1),
+                                                                          "pad": (0, 0)}
+                                                                        ],
+                                                                 warmup=warmup,
+                                                                 runs=runs)
     # Prepare combined results
     mx_pooling_op_results = merge_map_list(pool1d_benchmark_res + pool2d_benchmark_res)
     return mx_pooling_op_results
 
 
-def run_convolution_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', warmup=25, runs=100):
-    # Conv1D Benchmarks
+def run_convolution_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', large_tensor='off', warmup=25, runs=100):
+    """Runs benchmarks with the given context, precision (dtype), and input data size (large_tensor) for all the convolution
+    operators in MXNet.
+
+    Parameters
+    ----------
+    ctx: mx.ctx
+        Context to run benchmarks
+    dtype: str, default 'float32'
+        Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
+    warmup: int, default 25
+        Number of times to run for warmup
+    runs: int, default 100
+        Number of runs to capture benchmark results
+
+    Returns
+    -------
+    Dictionary of results. Key -> Name of the operator, Value -> Benchmark results.
+
+    """
     conv1d_benchmark_res = []
-    for conv_data in [(32, 3, 256), (32, 3, 64)]:
-        conv1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Convolution")],
-                                                     run_backward=True,
-                                                     dtype=dtype,
-                                                     ctx=ctx,
-                                                     profiler=profiler,
-                                                     inputs=[{"data": conv_data,
-                                                              "weight": (64, 3, 3),
-                                                              "bias": (64,),
-                                                              "kernel": (3,),
-                                                              "stride": (1,),
-                                                              "dilate": (1,),
-                                                              "pad": (0,),
-                                                              "num_filter": 64,
-                                                              "layout": 'NCW'}
-                                                             ],
-                                                     warmup=warmup,
-                                                     runs=runs)
-    # Conv2D Benchmarks
     conv2d_benchmark_res = []
-    for conv_data in [(32, 3, 256, 256), (32, 3, 64, 64)]:
-        conv2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Convolution")],
-                                                     run_backward=True,
-                                                     dtype=dtype,
-                                                     ctx=ctx,
-                                                     profiler=profiler,
-                                                     inputs=[{"data": conv_data,
-                                                              "weight": (64, 3, 3, 3),
-                                                              "bias": (64,),
-                                                              "kernel": (3, 3),
-                                                              "stride": (1, 1),
-                                                              "dilate": (1, 1),
-                                                              "pad": (0, 0),
-                                                              "num_filter": 64,
-                                                              "layout": 'NCHW'}
-                                                             ],
-                                                     warmup=warmup,
-                                                     runs=runs)
+    if large_tensor == 'on':
+        # Conv1D Benchmarks
+        for conv_data in [(2**30, 1, 4), (2**31, 1, 3)]:
+            conv1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Convolution")],
+                                                         run_backward=True,
+                                                         dtype=dtype,
+                                                         ctx=ctx,
+                                                         profiler=profiler,
+                                                         inputs=[{"data": conv_data,
+                                                                  "weight": (1, 1, 3),
+                                                                  "bias": (1,),
+                                                                  "kernel": (3,),
+                                                                  "stride": (1,),
+                                                                  "dilate": (1,),
+                                                                  "pad": (0,),
+                                                                  "num_filter": 1,
+                                                                  "layout": 'NCW'}
+                                                                ],
+                                                         warmup=warmup,
+                                                         runs=runs)
+        # Conv2D Benchmarks
+        for conv_data in [(2**29, 1, 3, 3), (2**28, 1, 4, 4)]:
+            conv2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Convolution")],
+                                                         run_backward=True,
+                                                         dtype=dtype,
+                                                         ctx=ctx,
+                                                         profiler=profiler,
+                                                         inputs=[{"data": conv_data,
+                                                                  "weight": (1, 1, 3, 3),
+                                                                  "bias": (1,),
+                                                                  "kernel": (3, 3),
+                                                                  "stride": (1, 1),
+                                                                  "dilate": (1, 1),
+                                                                  "pad": (0, 0),
+                                                                  "num_filter": 1,
+                                                                  "layout": 'NCHW'}
+                                                                ],
+                                                         warmup=warmup,
+                                                         runs=runs)
+    else:
+        # Conv1D Benchmarks
+        for conv_data in [(32, 3, 256), (32, 3, 64)]:
+            conv1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Convolution")],
+                                                         run_backward=True,
+                                                         dtype=dtype,
+                                                         ctx=ctx,
+                                                         profiler=profiler,
+                                                         inputs=[{"data": conv_data,
+                                                                  "weight": (64, 3, 3),
+                                                                  "bias": (64,),
+                                                                  "kernel": (3,),
+                                                                  "stride": (1,),
+                                                                  "dilate": (1,),
+                                                                  "pad": (0,),
+                                                                  "num_filter": 64,
+                                                                  "layout": 'NCW'}
+                                                                ],
+                                                         warmup=warmup,
+                                                         runs=runs)
+        # Conv2D Benchmarks
+        for conv_data in [(32, 3, 256, 256), (32, 3, 64, 64)]:
+            conv2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Convolution")],
+                                                         run_backward=True,
+                                                         dtype=dtype,
+                                                         ctx=ctx,
+                                                         profiler=profiler,
+                                                         inputs=[{"data": conv_data,
+                                                                  "weight": (64, 3, 3, 3),
+                                                                  "bias": (64,),
+                                                                  "kernel": (3, 3),
+                                                                  "stride": (1, 1),
+                                                                  "dilate": (1, 1),
+                                                                  "pad": (0, 0),
+                                                                  "num_filter": 64,
+                                                                  "layout": 'NCHW'}
+                                                                ],
+                                                         warmup=warmup,
+                                                         runs=runs)
     # Prepare combined results
     mx_conv_op_results = merge_map_list(conv1d_benchmark_res + conv2d_benchmark_res)
     return mx_conv_op_results
 
 
-def run_transpose_convolution_operators_benchmarks(ctx=mx.cpu(), profiler='native', dtype='float32', warmup=10, runs=50):
+def run_transpose_convolution_operators_benchmarks(ctx=mx.cpu(), profiler='native', large_tensor='off', dtype='float32', warmup=10, runs=50):
+    """Runs benchmarks with the given context, precision (dtype), and input data size (large_tensor) for all the transpose convolution
+    operators in MXNet.
+
+    Parameters
+    ----------
+    ctx: mx.ctx
+        Context to run benchmarks
+    dtype: str, default 'float32'
+        Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
+    warmup: int, default 25
+        Number of times to run for warmup
+    runs: int, default 100
+        Number of runs to capture benchmark results
+
+    Returns
+    -------
+    Dictionary of results. Key -> Name of the operator, Value -> Benchmark results.
+
+    """
     # Conv1DTranspose Benchmarks
     conv1d_transpose_benchmark_res = []
-    for conv_data in [(32, 3, 256), (32, 3, 64)]:
-        conv1d_transpose_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Deconvolution")],
-                                                               run_backward=True,
-                                                               dtype=dtype,
-                                                               ctx=ctx,
-                                                               profiler=profiler,
-                                                               inputs=[{"data": conv_data,
-                                                                        "weight": (3, 64, 3),
-                                                                        "bias": (64,),
-                                                                        "kernel": (3,),
-                                                                        "stride": (1,),
-                                                                        "dilate": (1,),
-                                                                        "pad": (0,),
-                                                                        "adj": (0,),
-                                                                        "num_filter": 64,
-                                                                        "no_bias": False,
-                                                                        "layout": 'NCW'}
-                                                                       ],
-                                                               warmup=warmup,
-                                                               runs=runs)
-    # Conv2DTranspose Benchmarks
-    conv2d_transpose_benchmark_res = []
-    for conv_data in [(32, 3, 256, 256), (32, 3, 64, 64)]:
-        conv2d_transpose_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Deconvolution")],
-                                                               run_backward=True,
-                                                               dtype=dtype,
-                                                               ctx=ctx,
-                                                               profiler=profiler,
-                                                               inputs=[{"data": conv_data,
-                                                                        "weight": (3, 64, 3, 3),
-                                                                        "bias": (64,),
-                                                                        "kernel": (3, 3),
-                                                                        "stride": (1, 1),
-                                                                        "dilate": (1, 1),
-                                                                        "pad": (0, 0),
-                                                                        "num_filter": 64,
-                                                                        "no_bias": False,
-                                                                        "layout": 'NCHW'}
-                                                                       ],
-                                                               warmup=warmup,
-                                                               runs=runs)
+    if large_tensor == 'on':
+        for conv_data in [(2**30, 1, 4), (2**31, 1, 3)]:
+            conv1d_transpose_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Deconvolution")],
+                                                                   run_backward=True,
+                                                                   dtype=dtype,
+                                                                   ctx=ctx,
+                                                                   profiler=profiler,
+                                                                   inputs=[{"data": conv_data,
+                                                                            "weight": (1, 1, 3),
+                                                                            "bias": (1,),
+                                                                            "kernel": (3,),
+                                                                            "stride": (1,),
+                                                                            "dilate": (1,),
+                                                                            "pad": (0,),
+                                                                            "num_filter": 1,
+                                                                            "no_bias": False,
+                                                                            "layout": 'NCW'}
+                                                                          ],
+                                                                   warmup=warmup,
+                                                                   runs=runs)
+        # Conv2DTranspose Benchmarks
+        conv2d_transpose_benchmark_res = []
+        for conv_data in [(2**29, 1, 3, 3), (2**28, 1, 4, 4)]:
+            conv2d_transpose_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Deconvolution")],
+                                                                   run_backward=True,
+                                                                   dtype=dtype,
+                                                                   ctx=ctx,
+                                                                   profiler=profiler,
+                                                                   inputs=[{"data": conv_data,
+                                                                            "weight": (1, 1, 3, 3),
+                                                                            "bias": (1,),
+                                                                            "kernel": (3, 3),
+                                                                            "stride": (1, 1),
+                                                                            "pad": (0, 0),
+                                                                            "num_filter": 1,
+                                                                            "no_bias": False,
+                                                                            "layout": 'NCHW'}
+                                                                          ],
+                                                                   warmup=warmup,
+                                                                   runs=runs)
+    else:
 
 Review comment:
   It seems the only difference between the if and else branch is the `inputs` argument. Can we only generate different inputs in the if/else branch and pass them to the same operator function?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373705688
 
 

 ##########
 File path: benchmark/opperf/nd_operations/array_rearrange.py
 ##########
 @@ -39,6 +39,8 @@ def run_rearrange_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='
         Context to run benchmarks
     dtype: str, default 'float32'
         Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
 
 Review comment:
   Please specify explicitly here the tensor size is over 2^32

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373707455
 
 

 ##########
 File path: benchmark/opperf/nd_operations/nn_conv_operators.py
 ##########
 @@ -60,131 +81,286 @@ def run_pooling_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='na
     pool2d_benchmark_res = []
     for pool_type in pool_types:
         for global_pool in global_pool_types:
-            for pool1d_data in [(32, 3, 256), (32, 3, 64)]:
-                pool1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
-                                                             run_backward=True,
-                                                             dtype=dtype,
-                                                             ctx=ctx,
-                                                             profiler=profiler,
-                                                             inputs=[{"data": pool1d_data,
-                                                                      "kernel": 3,
-                                                                      "pool_type": pool_type,
-                                                                      "global_pool": global_pool,
-                                                                      "stride": 1,
-                                                                      "pad": 1}
-                                                                     ],
-                                                             warmup=warmup,
-                                                             runs=runs)
-            for pool2d_data in [(32, 3, 256, 256), (32, 3, 64, 64)]:
-                pool2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
-                                                             run_backward=True,
-                                                             dtype=dtype,
-                                                             ctx=ctx,
-                                                             profiler=profiler,
-                                                             inputs=[{"data": pool2d_data,
-                                                                      "kernel": (3, 3),
-                                                                      "pool_type": pool_type,
-                                                                      "global_pool": global_pool,
-                                                                      "stride": (1, 1),
-                                                                      "pad": (0, 0)}
-                                                                     ],
-                                                             warmup=warmup,
-                                                             runs=runs)
+            if large_tensor == 'on':
+                for pool1d_data in [(1, 1, 2**32), (2**31, 1, 3)]:
+                    pool1d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
+                                                                 run_backward=True,
+                                                                 dtype=dtype,
+                                                                 ctx=ctx,
+                                                                 profiler=profiler,
+                                                                 inputs=[{"data": pool1d_data,
+                                                                          "kernel": 3,
+                                                                          "pool_type": pool_type,
+                                                                          "global_pool": global_pool,
+                                                                          "stride": 1,
+                                                                          "pad": 1}
+                                                                        ],
+                                                                 warmup=warmup,
+                                                                 runs=runs)
+                for pool2d_data in [(2**29, 1, 3, 3), (2**28, 1, 4, 4)]:
+                    pool2d_benchmark_res += run_performance_test([getattr(MX_OP_MODULE, "Pooling")],
+                                                                 run_backward=True,
+                                                                 dtype=dtype,
+                                                                 ctx=ctx,
+                                                                 profiler=profiler,
+                                                                 inputs=[{"data": pool2d_data,
+                                                                          "kernel": (3, 3),
+                                                                          "pool_type": pool_type,
+                                                                          "global_pool": global_pool,
+                                                                          "stride": (1, 1),
+                                                                          "pad": (0, 0)}
+                                                                        ],
+                                                                 warmup=warmup,
+                                                                 runs=runs)
+            else:
 
 Review comment:
   It seems the only difference between the if and else branch is the `inputs` argument. Can we only generate different inputs in the if/else branch and pass them to the same operator function?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] connorgoggins commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

connorgoggins commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r371514537
 
 

 ##########
 File path: benchmark/opperf/nd_operations/nn_activation_operators.py
 ##########
 @@ -55,55 +55,106 @@ def run_activation_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler=
     Dictionary of results. Key -> Name of the operator, Value -> Benchmark results.
 
     """
-    # Relu and its variation
-    relu_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "LeakyReLU")],
-                                              run_backward=True,
-                                              dtype=dtype,
-                                              ctx=ctx,
-                                              profiler=profiler,
-                                              inputs=[{"data": (1024, 1024), "act_type": "leaky", "slope": 0.1},
-                                                      {"data": (10000, 1), "act_type": "leaky", "slope": 0.1},
-                                                      {"data": (10000, 100), "act_type": "leaky", "slope": 0.1},
-                                                      {"data": (1024, 1024), "act_type": "elu", "slope": 0.1},
-                                                      {"data": (10000, 1), "act_type": "elu", "slope": 0.1},
-                                                      {"data": (10000, 100), "act_type": "elu", "slope": 0.1},
-                                                      {"data": (1024, 1024), "act_type": "selu"},
-                                                      {"data": (10000, 1), "act_type": "selu"},
-                                                      {"data": (10000, 100), "act_type": "selu"},
-                                                      {"data": (1024, 1024), "act_type": "prelu", "gamma": (1, 1024)},
-                                                      {"data": (10000, 1), "act_type": "prelu", "gamma": (1, 1)},
-                                                      {"data": (10000, 100), "act_type": "prelu", "gamma": (1, 100)}
-                                                      ],
-                                              warmup=warmup,
-                                              runs=runs)
+    if large_tensor == 'on':
+        # Relu and its variation
+        relu_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "LeakyReLU")],
+                                                run_backward=True,
+                                                dtype=dtype,
+                                                ctx=ctx,
+                                                profiler=profiler,
+                                                inputs=[{"data": (2**16, 2**16), "act_type": "leaky", "slope": 0.1},
+                                                        {"data": (2**4, 2**28), "act_type": "leaky", "slope": 0.1},
+                                                        {"data": (4, 2**30), "act_type": "leaky", "slope": 0.1},
+                                                        {"data": (2**16, 2**16), "act_type": "elu", "slope": 0.1},
+                                                        {"data": (2**4, 2**28), "act_type": "elu", "slope": 0.1},
+                                                        {"data": (4, 2**30), "act_type": "elu", "slope": 0.1},
+                                                        {"data": (2**16, 2**16), "act_type": "selu"},
+                                                        {"data": (2**4, 2**28), "act_type": "selu"},
+                                                        {"data": (4, 2**30), "act_type": "selu"},
+                                                        {"data": (2**16, 2**16), "act_type": "prelu", "gamma": (1, 2**16)},
+                                                        {"data": (2**4, 2**28), "act_type": "prelu", "gamma": (1, 2**28)},
+                                                        {"data": (4, 2**30), "act_type": "prelu", "gamma": (1, 2**30)}
+                                                        ],
+                                                warmup=warmup,
+                                                runs=runs)
 
-    # Sigmoid => Covered as part of Unary ops
-    # Hard_Sigmoid
-    hard_sigmoid_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "hard_sigmoid")],
-                                                      run_backward=True,
-                                                      dtype=dtype,
-                                                      ctx=ctx,
-                                                      profiler=profiler,
-                                                      inputs=[{"data": (1024, 1024), "alpha": 0.25, "beta": 0.5},
-                                                              {"data": (10000, 1), "alpha": 0.25, "beta": 0.5},
-                                                              {"data": (10000, 100), "alpha": 0.25, "beta": 0.5}
-                                                              ],
-                                                      warmup=warmup,
-                                                      runs=runs)
+        # Sigmoid => Covered as part of Unary ops
+        # Hard_Sigmoid
+        hard_sigmoid_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "hard_sigmoid")],
+                                                        run_backward=True,
 
 Review comment:
   Agreed - fixed lint indentation errors.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] connorgoggins commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

connorgoggins commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r378430775
 
 

 ##########
 File path: benchmark/opperf/nd_operations/gemm_operators.py
 ##########
 @@ -55,33 +57,62 @@ def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nativ
 
     """
     # Benchmark tests for dot and batch_dot operators
-    dot_benchmark_res = run_performance_test(
-        [getattr(MX_OP_MODULE, "dot")], run_backward=True,
-        dtype=dtype, ctx=ctx,
-        inputs=[{"lhs": (1024, 1024),
-                 "rhs": (1024, 1024)},
-                {"lhs": (1000, 10),
-                 "rhs": (1000, 10),
-                 "transpose_b": True},
-                {"lhs": (1000, 1),
-                 "rhs": (100, 1000),
-                 "transpose_a": True,
-                 "transpose_b": True}],
-        warmup=warmup, runs=runs, profiler=profiler)
+    if large_tensor == "on":
 
 Review comment:
   The purpose of this flag wouldn't be for use on user-specified shapes, it would be for general category and full suite testing of operator performance on input data with dimensions >= 2^32. If the user wanted to test individual operators with custom shapes, they would use `run_performance_test()` and add their custom data as input - they wouldn't use the flag in that case, as the `run_performance_test()` function doesn't take in the `large_tensor` flag as an argument.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373706497
 
 

 ##########
 File path: benchmark/opperf/nd_operations/nn_activation_operators.py
 ##########
 @@ -45,6 +45,8 @@ def run_activation_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler=
         Context to run benchmarks
     dtype: str, default 'float32'
         Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
 
 Review comment:
   Same here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373707309
 
 

 ##########
 File path: benchmark/opperf/nd_operations/gemm_operators.py
 ##########
 @@ -55,33 +57,62 @@ def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nativ
 
     """
     # Benchmark tests for dot and batch_dot operators
-    dot_benchmark_res = run_performance_test(
-        [getattr(MX_OP_MODULE, "dot")], run_backward=True,
-        dtype=dtype, ctx=ctx,
-        inputs=[{"lhs": (1024, 1024),
-                 "rhs": (1024, 1024)},
-                {"lhs": (1000, 10),
-                 "rhs": (1000, 10),
-                 "transpose_b": True},
-                {"lhs": (1000, 1),
-                 "rhs": (100, 1000),
-                 "transpose_a": True,
-                 "transpose_b": True}],
-        warmup=warmup, runs=runs, profiler=profiler)
+    if large_tensor == "on":
+        dot_benchmark_res = run_performance_test(
+            [getattr(MX_OP_MODULE, "dot")], run_backward=True,
+            dtype=dtype, ctx=ctx,
+            inputs=[{"lhs": (2**16, 2**16),
+                     "rhs": (2**16, 2**16)},
+                    {"lhs": (4, 2**30),
+                     "rhs": (4, 2**30),
+                     "transpose_b": True},
+                    {"lhs": (2**28, 16),
+                     "rhs": (16, 2**28),
+                     "transpose_a": True,
+                     "transpose_b": True}],
+            warmup=warmup, runs=runs, profiler=profiler)
 
-    batch_dot_benchmark_res = run_performance_test(
-        [getattr(MX_OP_MODULE, "batch_dot")], run_backward=True,
-        dtype=dtype, ctx=ctx,
-        inputs=[{"lhs": (32, 1024, 1024),
-                 "rhs": (32, 1024, 1024)},
-                {"lhs": (32, 1000, 10),
-                 "rhs": (32, 1000, 10),
-                 "transpose_b": True},
-                {"lhs": (32, 1000, 1),
-                 "rhs": (32, 100, 1000),
-                 "transpose_a": True,
-                 "transpose_b": True}],
-        warmup=warmup, runs=runs, profiler=profiler)
+        batch_dot_benchmark_res = run_performance_test(
+            [getattr(MX_OP_MODULE, "batch_dot")], run_backward=True,
+            dtype=dtype, ctx=ctx,
+            inputs=[{"lhs": (1, 2**16, 2**16),
+                     "rhs": (1, 2**16, 2**16)},
+                    {"lhs": (1, 4, 2**30),
+                     "rhs": (1, 4, 2**30),
+                     "transpose_b": True},
+                    {"lhs": (1, 2**28, 16),
+                     "rhs": (1, 16, 2**28),
+                     "transpose_a": True,
+                     "transpose_b": True}],
+            warmup=warmup, runs=runs, profiler=profiler)
+    else:
 
 Review comment:
   It seems the only difference between the if and else branch is the `inputs` argument. Can we only generate different inputs in the if/else branch and pass them to the same operator function?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373707375
 
 

 ##########
 File path: benchmark/opperf/nd_operations/nn_basic_operators.py
 ##########
 @@ -29,58 +29,132 @@
 """
 
 
-def run_nn_basic_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', warmup=25, runs=100):
-    # FullyConnnected operator benchmarks
-    fc_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "FullyConnected")],
-                                            run_backward=True,
-                                            dtype=dtype,
-                                            ctx=ctx,
-                                            profiler=profiler,
-                                            inputs=[{"data": (32, 3, 256, 256),
-                                                     "num_hidden": 64,
-                                                     "weight": (64, 3 * 256 * 256),
-                                                     "bias": (64,),
-                                                     "flatten": True},
-                                                    {"data": (32, 3, 256, 256),
-                                                     "num_hidden": 64,
-                                                     "weight": (64, 256),
-                                                     "bias": (64,),
-                                                     "flatten": False}],
-                                            warmup=warmup,
-                                            runs=runs)
+def run_nn_basic_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', large_tensor='off', warmup=25, runs=100):
+    """Runs benchmarks with the given context, precision (dtype), and data size (large_tensor) for all the basic neural network
+    operators in MXNet.
 
-    # Dropout benchmarks
-    dropout_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "Dropout")],
-                                                 run_backward=True,
-                                                 dtype=dtype,
-                                                 ctx=ctx,
-                                                 profiler=profiler,
-                                                 inputs=[{"data": (32, 3, 256, 256),
-                                                          "p": 0.5,
-                                                          "mode": "always"},
-                                                         {"data": (10000, 10),
-                                                          "p": 0.5,
-                                                          "mode": "always"}],
-                                                 warmup=warmup,
-                                                 runs=runs)
-    # BatchNorm benchmarks
-    batchnorm_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "BatchNorm")],
-                                                   run_backward=True,
-                                                   dtype=dtype,
-                                                   ctx=ctx,
-                                                   profiler=profiler,
-                                                   inputs=[{"data": (32, 3, 256, 256),
-                                                            "gamma": (3,),
-                                                            "beta": (3,),
-                                                            "moving_mean": (3,),
-                                                            "moving_var": (3,)},
-                                                           {"data": (32, 3, 10000, 10),
-                                                            "gamma": (3,),
-                                                            "beta": (3,),
-                                                            "moving_mean": (3,),
-                                                            "moving_var": (3,)}],
-                                                   warmup=warmup,
-                                                   runs=runs)
+    Parameters
+    ----------
+    ctx: mx.ctx
+        Context to run benchmarks
+    dtype: str, default 'float32'
+        Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
+    warmup: int, default 25
+        Number of times to run for warmup
+    runs: int, default 100
+        Number of runs to capture benchmark results
+
+    Returns
+    -------
+    Dictionary of results. Key -> Name of the operator, Value -> Benchmark results.
+
+    """
+    if large_tensor == 'on':
+        # FullyConnnected operator benchmarks
+        fc_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "FullyConnected")],
+                                                run_backward=True,
+                                                dtype=dtype,
+                                                ctx=ctx,
+                                                profiler=profiler,
+                                                inputs=[{"data": (2**15, 3, 256, 256),
+                                                         "num_hidden": 64,
+                                                         "weight": (64, 3 * 256 * 256),
+                                                         "bias": (64,),
+                                                         "flatten": True},
+                                                        {"data": (2**17, 3, 128, 128),
+                                                         "num_hidden": 64,
+                                                         "weight": (64, 3 * 128 * 128),
+                                                         "bias": (64,),
+                                                         "flatten": False}],
+                                                warmup=warmup,
+                                                runs=runs)
+
+        # Dropout benchmarks
+        dropout_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "Dropout")],
+                                                     run_backward=True,
+                                                     dtype=dtype,
+                                                     ctx=ctx,
+                                                     profiler=profiler,
+                                                     inputs=[{"data": (2**15, 3, 256, 256),
+                                                              "p": 0.5,
+                                                              "mode": "always"},
+                                                             {"data": (2**28, 16),
+                                                              "p": 0.5,
+                                                              "mode": "always"}],
+                                                     warmup=warmup,
+                                                     runs=runs)
+        # BatchNorm benchmarks
+        batchnorm_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "BatchNorm")],
+                                                       run_backward=True,
+                                                       dtype=dtype,
+                                                       ctx=ctx,
+                                                       profiler=profiler,
+                                                       inputs=[{"data": (2**15, 3, 256, 256),
+                                                                "gamma": (3,),
+                                                                "beta": (3,),
+                                                                "moving_mean": (3,),
+                                                                "moving_var": (3,)},
+                                                               {"data": (2**14, 3, 10000, 10),
+                                                                "gamma": (3,),
+                                                                "beta": (3,),
+                                                                "moving_mean": (3,),
+                                                                "moving_var": (3,)}],
+                                                       warmup=warmup,
+                                                       runs=runs)
+    else:
 
 Review comment:
   It seems the only difference between the if and else branch is the `inputs` argument. Can we only generate different inputs in the if/else branch and pass them to the same operator function?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373705848
 
 

 ##########
 File path: benchmark/opperf/nd_operations/binary_operators.py
 ##########
 @@ -48,6 +48,8 @@ def run_mx_binary_broadcast_operators_benchmarks(ctx=mx.cpu(), dtype='float32',
         Context to run benchmarks
     dtype: str, default 'float32'
         Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
 
 Review comment:
   Same here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya removed a comment on issue #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya removed a comment on issue #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-579949961
 
 
   Actually, if the mxnet is built with LTS ON then user can just give >2**32
   as a custom shape and use the opperf utility.
   
   ```
   import mxnet as mx
   from mxnet import nd
   
   from benchmark.opperf.utils.benchmark_utils import run_performance_test
   run_performance_test(nd.add, run_backward=True, dtype='float32', ctx=mx.cpu(),
                                  inputs=[{"lhs": (2**32+1, 1),
                                           "rhs": (2**32+1, 1)}],
                                  warmup=0, runs=1)
   ```
   
   This flag serves as a quick way of testing for Large tensor Ops.
   So for example if user doesn't want to add custom shapes for each operator
   and just wants to see perf times for all operators then this flag comes in
   handy.
   
   ```
   python incubator-mxnet/benchmark/opperf/opperf.py --output-format json --output-file mxnet_operator_benchmark_results.json --large-tensor ON
   ```
   
   So ya, both are separate use cases and both are possible.
   With the obvious assumption, mxnet is built with USE_INT64_TENSOR_SIZE = ON
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] connorgoggins commented on issue #17449: [Large Tensor] Implemented LT flag for OpPerf testing

Posted by GitBox <gi...@apache.org>.

connorgoggins commented on issue #17449: [Large Tensor] Implemented LT flag for OpPerf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-593528838
 
 
   @ChaiBapchya thanks for pointing this out. When I ran my tests with this PR on Friday, #17400 hadn't been merged into master yet so the conflicts did not appear. I believe [your PR](https://github.com/apache/incubator-mxnet/pull/17735) will fix these issues - thanks for your contribution!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya edited a comment on issue #17449: [Large Tensor] Implemented LT flag for OpPerf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya edited a comment on issue #17449: [Large Tensor] Implemented LT flag for OpPerf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-593053525
 
 
   While full opperf suite was run initially (and has been linked in the description)
   was full opperf run after the subsequent commits? like new ops added and merges?
   
   Could you paste opperf results after commit 256ad70
   
   Coz right now, with master (cuda, cudnn ON)
   full opperf suite runs into error forvarious ops
   1. Optimizer update ops
   2. BatchNorm coz of cudnn error
   ```
   MXNetError: Check failed: param.eps >= 1e-5 (1e-08 vs. 1e-05) : CuDNN requires eps to be no less than 1e-05
   ```
   3.  lamb_update_phase1&2
   ```
   Traceback (most recent call last):
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 213, in <module>
       sys.exit(main())
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 193, in main
       benchmark_results = run_all_mxnet_operator_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs)
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 111, in run_all_mxnet_operator_benchmarks
       mxnet_operator_benchmark_results.append(run_optimizer_operators_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs))
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/nd_operations/nn_optimizer_operators.py", line 142, in run_optimizer_operators_benchmarks
       mx_optimizer_op_results = run_op_benchmarks(mx_optimizer_ops, dtype, ctx, profiler, int64_tensor, warmup, runs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 210, in run_op_benchmarks
       warmup=warmup, runs=runs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 177, in run_performance_test
       benchmark_result = _run_nd_operator_performance_test(op, inputs, run_backward, warmup, runs, kwargs_list, profiler)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 114, in _run_nd_operator_performance_test
       _, _ = benchmark_helper_func(op, warmup, **kwargs_list[0])
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/profiler_utils.py", line 200, in cpp_profile_it
       res = func(*args, **kwargs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/ndarray_utils.py", line 97, in nd_forward_and_profile
       res = op(**kwargs_new)
     File "<string>", line 113, in lamb_update_phase1
     File "/home/ubuntu/incubator-mxnet/python/mxnet/_ctypes/ndarray.py", line 91, in _imperative_invoke
       ctypes.byref(out_stypes)))
     File "/home/ubuntu/incubator-mxnet/python/mxnet/base.py", line 246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: MXNetError: Required parameter wd of float is not presented, in operator lamb_update_phase1(name="", t="1", rescale_grad="0.4", epsilon="1e-08", beta2="0.1", beta1="0.1")
   *** Error in `python': corrupted double-linked list: 0x000055b58a93f6c0 ***
   ```
   
   The PR which introduced lamb_update_phase1 to opperf https://github.com/apache/incubator-mxnet/pull/17542 worked for CUDA CUDNN ON
   but now it doesn't.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya commented on issue #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya commented on issue #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-579949961
 
 
   Actually, if the mxnet is built with LTS ON then user can just give >2**32
   as a custom shape and use the opperf utility.
   
   This flag serves as a quick way of testing for Large tensor Ops.
   So for example if user doesn't want to add custom shapes for each operator
   and just wants to see perf times for all operators then this flag comes in
   handy.
   
   So ya, both are separate use cases and both are possible.
   With the obvious assumption, mxnet is built with USE_INT64_TENSOR_SIZE = ON
   
   
   On Wed, 29 Jan 2020 at 10:31, Lin Yuan <no...@github.com> wrote:
   
   > Can users specify custom shapes to test the performance of large tensor
   > instead of using a param? That gives more freedom to users.
   >
   > —
   > You are receiving this because you commented.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/incubator-mxnet/pull/17449?email_source=notifications&email_token=ACT3X62ZQIGIEGFNAFA7UUTRAHDRZA5CNFSM4KMGEWTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKIIJDY#issuecomment-579896463>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ACT3X62HGRE6DM3YHJO33ELRAHDRZANCNFSM4KMGEWTA>
   > .
   >
   
   
   -- 
   *Chaitanya Prakash Bapat*
   *+1 (973) 953-6299*
   
   [image: https://www.linkedin.com//in/chaibapat25]
   <https://github.com/ChaiBapchya>[image: https://www.facebook.com/chaibapat]
   <https://www.facebook.com/chaibapchya>[image:
   https://twitter.com/ChaiBapchya] <https://twitter.com/ChaiBapchya>[image:
   https://www.linkedin.com//in/chaibapat25]
   <https://www.linkedin.com//in/chaibapchya/>
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373706067
 
 

 ##########
 File path: benchmark/opperf/nd_operations/gemm_operators.py
 ##########
 @@ -44,6 +44,8 @@ def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nativ
         Context to run benchmarks
     dtype: str, default 'float32'
         Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
 
 Review comment:
   Same here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373706254
 
 

 ##########
 File path: benchmark/opperf/nd_operations/gemm_operators.py
 ##########
 @@ -55,33 +57,62 @@ def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nativ
 
     """
     # Benchmark tests for dot and batch_dot operators
-    dot_benchmark_res = run_performance_test(
-        [getattr(MX_OP_MODULE, "dot")], run_backward=True,
-        dtype=dtype, ctx=ctx,
-        inputs=[{"lhs": (1024, 1024),
-                 "rhs": (1024, 1024)},
-                {"lhs": (1000, 10),
-                 "rhs": (1000, 10),
-                 "transpose_b": True},
-                {"lhs": (1000, 1),
-                 "rhs": (100, 1000),
-                 "transpose_a": True,
-                 "transpose_b": True}],
-        warmup=warmup, runs=runs, profiler=profiler)
+    if large_tensor == "on":
 
 Review comment:
   What happens if this flag is ON and user also specifies custom shapes (which is small tensor).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya edited a comment on issue #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya edited a comment on issue #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-579949961
 
 
   Actually, if the mxnet is built with LTS ON then user can just give >2**32
   as a custom shape and use the opperf utility.
   
   ```
   import mxnet as mx
   from mxnet import nd
   
   from benchmark.opperf.utils.benchmark_utils import run_performance_test
   run_performance_test(nd.add, run_backward=True, dtype='float32', ctx=mx.cpu(),
                                  inputs=[{"lhs": (2**32+1, 1),
                                           "rhs": (2**32+1, 1)}],
                                  warmup=0, runs=1)
   ```
   
   This flag serves as a quick way of testing for Large tensor Ops.
   So for example if user doesn't want to add custom shapes for each operator
   and just wants to see perf times for all operators then this flag comes in
   handy.
   
   ```
   python incubator-mxnet/benchmark/opperf/opperf.py --output-format json --output-file mxnet_operator_benchmark_results.json --large-tensor ON
   ```
   
   So ya, both are separate use cases and both are possible.
   With the obvious assumption, mxnet is built with USE_INT64_TENSOR_SIZE = ON
   
   
   On Wed, 29 Jan 2020 at 10:31, Lin Yuan <no...@github.com> wrote:
   
   > Can users specify custom shapes to test the performance of large tensor
   > instead of using a param? That gives more freedom to users.
   >
   > —
   > You are receiving this because you commented.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/incubator-mxnet/pull/17449?email_source=notifications&email_token=ACT3X62ZQIGIEGFNAFA7UUTRAHDRZA5CNFSM4KMGEWTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKIIJDY#issuecomment-579896463>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ACT3X62HGRE6DM3YHJO33ELRAHDRZANCNFSM4KMGEWTA>
   > .
   >
   
   
   -- 
   *Chaitanya Prakash Bapat*
   *+1 (973) 953-6299*
   
   [image: https://www.linkedin.com//in/chaibapat25]
   <https://github.com/ChaiBapchya>[image: https://www.facebook.com/chaibapat]
   <https://www.facebook.com/chaibapchya>[image:
   https://twitter.com/ChaiBapchya] <https://twitter.com/ChaiBapchya>[image:
   https://www.linkedin.com//in/chaibapat25]
   <https://www.linkedin.com//in/chaibapchya/>
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya commented on issue #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya commented on issue #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-578919591
 
 
   Also let's wait before 
   #17445 and #17444 merge 
   So that adding large tensor flag will not break the existing opperf utility.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] connorgoggins commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

connorgoggins commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r371597251
 
 

 ##########
 File path: benchmark/opperf/nd_operations/gemm_operators.py
 ##########
 @@ -55,33 +55,62 @@ def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nativ
 
 
 Review comment:
   Good catch, will do.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373707696
 
 

 ##########
 File path: benchmark/opperf/nd_operations/reduction_operators.py
 ##########
 @@ -41,6 +41,8 @@ def run_mx_reduction_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profile
         Context to run benchmarks
     dtype: str, default 'float32'
         Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
 
 Review comment:
   Be more specific please.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya edited a comment on issue #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya edited a comment on issue #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-579949961
 
 
   Actually, if the mxnet is built with LTS ON then user can just give >2**32
   as a custom shape and use the opperf utility.
   
   ```
   import mxnet as mx
   from mxnet import nd
   
   from benchmark.opperf.utils.benchmark_utils import run_performance_test
   run_performance_test(nd.add, run_backward=True, dtype='float32', ctx=mx.cpu(),
                                  inputs=[{"lhs": (2**32+1, 1),
                                           "rhs": (2**32+1, 1)}],
                                  warmup=0, runs=1)
   ```
   
   This flag serves as a quick way of testing for Large tensor Ops.
   So for example if user doesn't want to add custom shapes for each operator
   and just wants to see perf times for all operators then this flag comes in
   handy.
   
   ```
   python incubator-mxnet/benchmark/opperf/opperf.py --output-format json --output-file mxnet_operator_benchmark_results.json --large-tensor ON
   ```
   
   So ya, both are separate use cases and both are possible.
   With the obvious assumption, mxnet is built with USE_INT64_TENSOR_SIZE = ON
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] connorgoggins commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

connorgoggins commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r371423823
 
 

 ##########
 File path: benchmark/opperf/nd_operations/gemm_operators.py
 ##########
 @@ -55,33 +55,64 @@ def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nativ
 
     """
     # Benchmark tests for dot and batch_dot operators
-    dot_benchmark_res = run_performance_test(
-        [getattr(MX_OP_MODULE, "dot")], run_backward=True,
-        dtype=dtype, ctx=ctx,
-        inputs=[{"lhs": (1024, 1024),
-                 "rhs": (1024, 1024)},
-                {"lhs": (1000, 10),
-                 "rhs": (1000, 10),
-                 "transpose_b": True},
-                {"lhs": (1000, 1),
-                 "rhs": (100, 1000),
-                 "transpose_a": True,
-                 "transpose_b": True}],
-        warmup=warmup, runs=runs, profiler=profiler)
+    if large_tensor == "on":
+        print("dot")
 
 Review comment:
   Agreed - just dropped unnecessary print statements.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya commented on issue #17449: [Large Tensor] Implemented LT flag for OpPerf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya commented on issue #17449: [Large Tensor] Implemented LT flag for OpPerf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-593053525
 
 
   While full opperf suite was run initially (and has been linked in the description)
   was full opperf run after the subsequent commits? like new ops added and merges?
   
   Coz right now, with master (cuda, cudnn ON)
   full opperf suite runs into error for lamb_update_phase1
   ```
   Traceback (most recent call last):
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 213, in <module>
       sys.exit(main())
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 193, in main
       benchmark_results = run_all_mxnet_operator_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs)
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 111, in run_all_mxnet_operator_benchmarks
       mxnet_operator_benchmark_results.append(run_optimizer_operators_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs))
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/nd_operations/nn_optimizer_operators.py", line 142, in run_optimizer_operators_benchmarks
       mx_optimizer_op_results = run_op_benchmarks(mx_optimizer_ops, dtype, ctx, profiler, int64_tensor, warmup, runs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 210, in run_op_benchmarks
       warmup=warmup, runs=runs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 177, in run_performance_test
       benchmark_result = _run_nd_operator_performance_test(op, inputs, run_backward, warmup, runs, kwargs_list, profiler)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 114, in _run_nd_operator_performance_test
       _, _ = benchmark_helper_func(op, warmup, **kwargs_list[0])
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/profiler_utils.py", line 200, in cpp_profile_it
       res = func(*args, **kwargs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/ndarray_utils.py", line 97, in nd_forward_and_profile
       res = op(**kwargs_new)
     File "<string>", line 113, in lamb_update_phase1
     File "/home/ubuntu/incubator-mxnet/python/mxnet/_ctypes/ndarray.py", line 91, in _imperative_invoke
       ctypes.byref(out_stypes)))
     File "/home/ubuntu/incubator-mxnet/python/mxnet/base.py", line 246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: MXNetError: Required parameter wd of float is not presented, in operator lamb_update_phase1(name="", t="1", rescale_grad="0.4", epsilon="1e-08", beta2="0.1", beta1="0.1")
   *** Error in `python': corrupted double-linked list: 0x000055b58a93f6c0 ***
   ```
   
   The PR which introduced lamb_update_phase1 to opperf https://github.com/apache/incubator-mxnet/pull/17542 worked for CUDA CUDNN ON
   but now it doesn't.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya edited a comment on issue #17449: [Large Tensor] Implemented LT flag for OpPerf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya edited a comment on issue #17449: [Large Tensor] Implemented LT flag for OpPerf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-593053525
 
 
   While full opperf suite was run initially (and has been linked in the description)
   was full opperf run after the subsequent commits? like new ops added and merges?
   
   Could you paste opperf results after commit 256ad70
   
   Coz right now, with master (cuda, cudnn ON)
   full opperf suite runs into error for lamb_update_phase1
   ```
   Traceback (most recent call last):
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 213, in <module>
       sys.exit(main())
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 193, in main
       benchmark_results = run_all_mxnet_operator_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs)
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 111, in run_all_mxnet_operator_benchmarks
       mxnet_operator_benchmark_results.append(run_optimizer_operators_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs))
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/nd_operations/nn_optimizer_operators.py", line 142, in run_optimizer_operators_benchmarks
       mx_optimizer_op_results = run_op_benchmarks(mx_optimizer_ops, dtype, ctx, profiler, int64_tensor, warmup, runs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 210, in run_op_benchmarks
       warmup=warmup, runs=runs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 177, in run_performance_test
       benchmark_result = _run_nd_operator_performance_test(op, inputs, run_backward, warmup, runs, kwargs_list, profiler)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 114, in _run_nd_operator_performance_test
       _, _ = benchmark_helper_func(op, warmup, **kwargs_list[0])
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/profiler_utils.py", line 200, in cpp_profile_it
       res = func(*args, **kwargs)
     File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/ndarray_utils.py", line 97, in nd_forward_and_profile
       res = op(**kwargs_new)
     File "<string>", line 113, in lamb_update_phase1
     File "/home/ubuntu/incubator-mxnet/python/mxnet/_ctypes/ndarray.py", line 91, in _imperative_invoke
       ctypes.byref(out_stypes)))
     File "/home/ubuntu/incubator-mxnet/python/mxnet/base.py", line 246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: MXNetError: Required parameter wd of float is not presented, in operator lamb_update_phase1(name="", t="1", rescale_grad="0.4", epsilon="1e-08", beta2="0.1", beta1="0.1")
   *** Error in `python': corrupted double-linked list: 0x000055b58a93f6c0 ***
   ```
   
   The PR which introduced lamb_update_phase1 to opperf https://github.com/apache/incubator-mxnet/pull/17542 worked for CUDA CUDNN ON
   but now it doesn't.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya commented on issue #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya commented on issue #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-580019572
 
 
   @apeforest 
   Actually, if the mxnet is built with LTS ON then user can just give >2**32
   as a custom shape and use the opperf utility.
   
   ```
   import mxnet as mx
   from mxnet import nd
   
   from benchmark.opperf.utils.benchmark_utils import run_performance_test
   run_performance_test(nd.add, run_backward=True, dtype='float32', ctx=mx.cpu(),
                                  inputs=[{"lhs": (2**32+1, 1),
                                           "rhs": (2**32+1, 1)}],
                                  warmup=0, runs=1)
   ```
   
   This flag serves as a quick way of testing for Large tensor Ops.
   So for example if user doesn't want to add custom shapes for each operator
   and just wants to see perf times for all operators then this flag comes in
   handy.
   
   ```
   python incubator-mxnet/benchmark/opperf/opperf.py --output-format json --output-file mxnet_operator_benchmark_results.json --large-tensor ON
   ```
   
   So ya, both are separate use cases and both are possible.
   With the obvious assumption, mxnet is built with USE_INT64_TENSOR_SIZE = ON
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on issue #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on issue #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-580104099
 
 
   > This flag serves as a quick way of testing for Large tensor Ops.
   
   Can you think of a use case where customer want such a quick way instead of specifying a custom shape to test an operator? If I were a customer and want to know if an operator would meet the requirement of my input tensor (could be large), I would just specify the shape and test it. Using a flag `--large_tensor` is rather vague to me. What does it mean, how large is *LARGE*? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373707350
 
 

 ##########
 File path: benchmark/opperf/nd_operations/nn_activation_operators.py
 ##########
 @@ -55,55 +57,106 @@ def run_activation_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler=
     Dictionary of results. Key -> Name of the operator, Value -> Benchmark results.
 
     """
-    # Relu and its variation
-    relu_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "LeakyReLU")],
-                                              run_backward=True,
-                                              dtype=dtype,
-                                              ctx=ctx,
-                                              profiler=profiler,
-                                              inputs=[{"data": (1024, 1024), "act_type": "leaky", "slope": 0.1},
-                                                      {"data": (10000, 1), "act_type": "leaky", "slope": 0.1},
-                                                      {"data": (10000, 100), "act_type": "leaky", "slope": 0.1},
-                                                      {"data": (1024, 1024), "act_type": "elu", "slope": 0.1},
-                                                      {"data": (10000, 1), "act_type": "elu", "slope": 0.1},
-                                                      {"data": (10000, 100), "act_type": "elu", "slope": 0.1},
-                                                      {"data": (1024, 1024), "act_type": "selu"},
-                                                      {"data": (10000, 1), "act_type": "selu"},
-                                                      {"data": (10000, 100), "act_type": "selu"},
-                                                      {"data": (1024, 1024), "act_type": "prelu", "gamma": (1, 1024)},
-                                                      {"data": (10000, 1), "act_type": "prelu", "gamma": (1, 1)},
-                                                      {"data": (10000, 100), "act_type": "prelu", "gamma": (1, 100)}
-                                                      ],
-                                              warmup=warmup,
-                                              runs=runs)
+    if large_tensor == 'on':
+        # Relu and its variation
+        relu_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "LeakyReLU")],
+                                                  run_backward=True,
+                                                  dtype=dtype,
+                                                  ctx=ctx,
+                                                  profiler=profiler,
+                                                  inputs=[{"data": (2**16, 2**16), "act_type": "leaky", "slope": 0.1},
+                                                          {"data": (2**4, 2**28), "act_type": "leaky", "slope": 0.1},
+                                                          {"data": (4, 2**30), "act_type": "leaky", "slope": 0.1},
+                                                          {"data": (2**16, 2**16), "act_type": "elu", "slope": 0.1},
+                                                          {"data": (2**4, 2**28), "act_type": "elu", "slope": 0.1},
+                                                          {"data": (4, 2**30), "act_type": "elu", "slope": 0.1},
+                                                          {"data": (2**16, 2**16), "act_type": "selu"},
+                                                          {"data": (2**4, 2**28), "act_type": "selu"},
+                                                          {"data": (4, 2**30), "act_type": "selu"},
+                                                          {"data": (2**16, 2**16), "act_type": "prelu", "gamma": (1, 2**16)},
+                                                          {"data": (2**4, 2**28), "act_type": "prelu", "gamma": (1, 2**28)},
+                                                          {"data": (4, 2**30), "act_type": "prelu", "gamma": (1, 2**30)}
+                                                         ],
+                                                  warmup=warmup,
+                                                  runs=runs)
 
-    # Sigmoid => Covered as part of Unary ops
-    # Hard_Sigmoid
-    hard_sigmoid_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "hard_sigmoid")],
-                                                      run_backward=True,
-                                                      dtype=dtype,
-                                                      ctx=ctx,
-                                                      profiler=profiler,
-                                                      inputs=[{"data": (1024, 1024), "alpha": 0.25, "beta": 0.5},
-                                                              {"data": (10000, 1), "alpha": 0.25, "beta": 0.5},
-                                                              {"data": (10000, 100), "alpha": 0.25, "beta": 0.5}
-                                                              ],
-                                                      warmup=warmup,
-                                                      runs=runs)
+        # Sigmoid => Covered as part of Unary ops
+        # Hard_Sigmoid
+        hard_sigmoid_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "hard_sigmoid")],
+                                                          run_backward=True,
+                                                          dtype=dtype,
+                                                          ctx=ctx,
+                                                          profiler=profiler,
+                                                          inputs=[{"data": (2**16, 2**16), "alpha": 0.25, "beta": 0.5},
+                                                                  {"data": (2**4, 2**28), "alpha": 0.25, "beta": 0.5},
+                                                                  {"data": (4, 2**30), "alpha": 0.25, "beta": 0.5}
+                                                                 ],
+                                                          warmup=warmup,
+                                                          runs=runs)
 
-    # Softmax, LogSoftmax
-    softmax_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "softmax"),
-                                                  getattr(MX_OP_MODULE, "log_softmax")],
-                                                 run_backward=True,
-                                                 dtype=dtype,
-                                                 ctx=ctx,
-                                                 profiler=profiler,
-                                                 inputs=[{"data": (1024, 1024), "axis": -1, "temperature": 0.5},
-                                                         {"data": (10000, 1), "axis": -1, "temperature": 0.5},
-                                                         {"data": (10000, 100), "axis": -1, "temperature": 0.5}
+        # Softmax, LogSoftmax
+        softmax_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "softmax"),
+                                                      getattr(MX_OP_MODULE, "log_softmax")],
+                                                     run_backward=True,
+                                                     dtype=dtype,
+                                                     ctx=ctx,
+                                                     profiler=profiler,
+                                                     inputs=[{"data": (2**16, 2**16), "axis": -1, "temperature": 0.5},
+                                                             {"data": (2**4, 2**28), "axis": -1, "temperature": 0.5},
+                                                             {"data": (4, 2**30), "axis": -1, "temperature": 0.5}
+                                                            ],
+                                                     warmup=warmup,
+                                                     runs=runs)
+    else:
 
 Review comment:
   It seems the only difference between the if and else branch is the `inputs` argument. Can we only generate different inputs in the if/else branch and pass them to the same operator function?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r371594197
 
 

 ##########
 File path: benchmark/opperf/nd_operations/gemm_operators.py
 ##########
 @@ -55,33 +55,62 @@ def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nativ
 
 
 Review comment:
   Please add the large_tensor parameter in description

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

ChaiBapchya commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r371420014
 
 

 ##########
 File path: benchmark/opperf/nd_operations/nn_activation_operators.py
 ##########
 @@ -55,55 +55,106 @@ def run_activation_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler=
     Dictionary of results. Key -> Name of the operator, Value -> Benchmark results.
 
     """
-    # Relu and its variation
-    relu_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "LeakyReLU")],
-                                              run_backward=True,
-                                              dtype=dtype,
-                                              ctx=ctx,
-                                              profiler=profiler,
-                                              inputs=[{"data": (1024, 1024), "act_type": "leaky", "slope": 0.1},
-                                                      {"data": (10000, 1), "act_type": "leaky", "slope": 0.1},
-                                                      {"data": (10000, 100), "act_type": "leaky", "slope": 0.1},
-                                                      {"data": (1024, 1024), "act_type": "elu", "slope": 0.1},
-                                                      {"data": (10000, 1), "act_type": "elu", "slope": 0.1},
-                                                      {"data": (10000, 100), "act_type": "elu", "slope": 0.1},
-                                                      {"data": (1024, 1024), "act_type": "selu"},
-                                                      {"data": (10000, 1), "act_type": "selu"},
-                                                      {"data": (10000, 100), "act_type": "selu"},
-                                                      {"data": (1024, 1024), "act_type": "prelu", "gamma": (1, 1024)},
-                                                      {"data": (10000, 1), "act_type": "prelu", "gamma": (1, 1)},
-                                                      {"data": (10000, 100), "act_type": "prelu", "gamma": (1, 100)}
-                                                      ],
-                                              warmup=warmup,
-                                              runs=runs)
+    if large_tensor == 'on':
+        # Relu and its variation
+        relu_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "LeakyReLU")],
+                                                run_backward=True,
+                                                dtype=dtype,
+                                                ctx=ctx,
+                                                profiler=profiler,
+                                                inputs=[{"data": (2**16, 2**16), "act_type": "leaky", "slope": 0.1},
+                                                        {"data": (2**4, 2**28), "act_type": "leaky", "slope": 0.1},
+                                                        {"data": (4, 2**30), "act_type": "leaky", "slope": 0.1},
+                                                        {"data": (2**16, 2**16), "act_type": "elu", "slope": 0.1},
+                                                        {"data": (2**4, 2**28), "act_type": "elu", "slope": 0.1},
+                                                        {"data": (4, 2**30), "act_type": "elu", "slope": 0.1},
+                                                        {"data": (2**16, 2**16), "act_type": "selu"},
+                                                        {"data": (2**4, 2**28), "act_type": "selu"},
+                                                        {"data": (4, 2**30), "act_type": "selu"},
+                                                        {"data": (2**16, 2**16), "act_type": "prelu", "gamma": (1, 2**16)},
+                                                        {"data": (2**4, 2**28), "act_type": "prelu", "gamma": (1, 2**28)},
+                                                        {"data": (4, 2**30), "act_type": "prelu", "gamma": (1, 2**30)}
+                                                        ],
+                                                warmup=warmup,
+                                                runs=runs)
 
-    # Sigmoid => Covered as part of Unary ops
-    # Hard_Sigmoid
-    hard_sigmoid_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "hard_sigmoid")],
-                                                      run_backward=True,
-                                                      dtype=dtype,
-                                                      ctx=ctx,
-                                                      profiler=profiler,
-                                                      inputs=[{"data": (1024, 1024), "alpha": 0.25, "beta": 0.5},
-                                                              {"data": (10000, 1), "alpha": 0.25, "beta": 0.5},
-                                                              {"data": (10000, 100), "alpha": 0.25, "beta": 0.5}
-                                                              ],
-                                                      warmup=warmup,
-                                                      runs=runs)
+        # Sigmoid => Covered as part of Unary ops
+        # Hard_Sigmoid
+        hard_sigmoid_benchmark_res = run_performance_test([getattr(MX_OP_MODULE, "hard_sigmoid")],
+                                                        run_backward=True,
 
 Review comment:
   nitpick: indent fix
   
   run `make pylint` to catch these issues

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373707656
 
 

 ##########
 File path: benchmark/opperf/nd_operations/random_sampling_operators.py
 ##########
 @@ -44,6 +44,8 @@ def run_mx_random_sampling_operators_benchmarks(ctx=mx.cpu(), dtype='float32', p
         Context to run benchmarks
     dtype: str, default 'float32'
         Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
 
 Review comment:
   Be more specific please.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on issue #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on issue #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#issuecomment-579896463
 
 
   Can users specify custom shapes to test the performance of large tensor instead of using a param? That gives more freedom to users.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest merged pull request #17449: [Large Tensor] Implemented LT flag for OpPerf testing

Posted by GitBox <gi...@apache.org>.

apeforest merged pull request #17449: [Large Tensor] Implemented LT flag for OpPerf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373707604
 
 

 ##########
 File path: benchmark/opperf/nd_operations/nn_optimizer_operators.py
 ##########
 @@ -46,6 +46,8 @@ def run_optimizer_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='
         Context to run benchmarks
     dtype: str, default 'float32'
         Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
 
 Review comment:
   Be more specific please.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing

Posted by GitBox <gi...@apache.org>.

apeforest commented on a change in pull request #17449: Implemented large tensor flag for opperf testing
URL: https://github.com/apache/incubator-mxnet/pull/17449#discussion_r373707761
 
 

 ##########
 File path: benchmark/opperf/nd_operations/unary_operators.py
 ##########
 @@ -45,6 +45,8 @@ def run_mx_unary_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='n
         Context to run benchmarks
     dtype: str, default 'float32'
         Precision to use for benchmarks
+    large_tensor: str, default 'off'
+        Tensor size to use for tests
 
 Review comment:
   Be more specific please.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services