You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/08/17 13:19:00 UTC

[GitHub] [incubator-mxnet] bgawrych opened a new pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

bgawrych opened a new pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533


   ## Description ##
   This PR is continuation of https://github.com/apache/incubator-mxnet/pull/20450, where enabling BRGEMM automatically was requested. 
   
   
   Following script was used to benchmark - only single iteration was benchmarked to avoid measuring performance with cached reordered weights for BRGEMM.
   
   ```
   import mxnet as mx
   from mxnet import nd
   from mxnet.gluon import nn
   import time
   
   class CalibIter(mx.io.DataIter):
       def __init__(self, batch, data_shape, batch_size):
           super(CalibIter, self).__init__(batch_size)
           self.label_shape = (batch_size,)
           self.data_shape = data_shape
           if isinstance(data_shape, tuple):
             self.provide_data = [('data', data_shape)]
           else:
             self.provide_data = data_shape
           self.provide_label = []
           self.batch = batch
   
       def __iter__(self):
           yield self.batch
   
   def test_qfc():
       results = dict()
       N  = [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048,
             4096, 8192, 9216, 10240, 11264, 12288, 13312,
             14336, 15360, 16384, 32768, 65536, 131072]
       IC = [32, 64, 128, 256, 512, 1024, 2048, 4096]
       OC = [32, 64, 128, 256, 512, 1024, 2048, 4096]
   
       for ic in IC:
           for oc in OC:
               net = nn.Dense(units=oc, flatten=True,
                              weight_initializer=mx.init.Normal(),
                              bias_initializer=mx.init.Normal())
               net.initialize()
               net.hybridize(static_alloc=True, static_shape=True)
               x = mx.nd.random_uniform(shape=(1, ic), low=-1.0, high=1.0)
               net(x)
               batch = mx.io.DataBatch([x])
               calib_data = CalibIter(batch, [mx.io.DataDesc("data", shape=(1, ic), dtype='float32')], 1)
               net_quantized = mx.contrib.quant.quantize_net_v2(net, quantized_dtype='auto',
                                                                exclude_layers=None,
                                                                exclude_layers_match=None,
                                                                calib_data=calib_data,
                                                                calib_mode='naive',
                                                                num_calib_examples=1,
                                                                ctx=mx.current_context())
               net_quantized.hybridize(static_alloc=True, static_shape=True)
               mx.nd.waitall()
               total = 0
               for bs in N:
                   x = mx.nd.random_uniform(shape=(bs, ic), low=-1.0, high=1.0)
                   mx.nd.waitall()
                   tic = time.time()
                   o = net_quantized(x)
                   o.wait_to_read()
                   total = time.time() - tic
                   results[(bs, ic, oc)] = total
   
       for k,v in results.items():
           print(f"{k[0]};{k[1]};{k[2]};{v}")
   
   test_qfc()
   ```
   Attaching spreadsheet with results on CLX8280 (before this change). 
   [brgemm_igemm_cmp.xlsx](https://github.com/apache/incubator-mxnet/files/7000191/brgemm_igemm_cmp.xlsx)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693993788



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&

Review comment:
       I've just filtered attached spreadsheet to find boundaries where significant majority of results are "green"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#issuecomment-901100592


   @szha, @TaoLv Can you review? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bartekkuncer commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bartekkuncer commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r692193136



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurment results done on CLX8280

Review comment:
       I believe there is 'e' missing in measurement.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r694002398



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;

Review comment:
       > partly disagree: it is simple notation to show that particular bit is set or we used power of 2 value. But in this particular example 8192 is also good representation.
   
   @anko-intel This is not exactly true :) - this is x^2^k = x ^ (bmbm-1,....,b0)_2 :) 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693912659



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;
+}
+
+inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr,
+                                                   size_t batch_size,
+                                                   int dtype = -1) {

Review comment:
       What does the value (-1) mean? 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693974655



##########
File path: docs/static_site/src/pages/api/faq/env_var.md
##########
@@ -298,9 +298,9 @@ If ctypes is used, it must be `mxnet._ctypes.ndarray.NDArrayBase`.
   - Values: Int ```(default=-1)```
   - Flag to set num of elements that MKLDNN cache can hold. Default is -1 which means cache size is unbounded. Should only be set if your model has variable input shapes, as cache size may grow unbounded. The number represents the number of items in the cache and is proportional to the number of layers that use MKLDNN and different input shape.
 
-* MXNET_MKLDNN_DISABLE_BRGEMM_FC
-  - Values: 0, 1 ```(default=1)```
-  - Flag which disables BRGEMM kernels in FullyConnected executed with MKLDNN support - Should only be set to 0 if your model has constant input shapes or FullyConnected is calculated with large tensors. Supported on machines with AVX512-VNNI.
+* MXNET_MKLDNN_FORCE_FC_AB_FORMAT
+  - Values: 0, 1 ```(default=0)```
+  - If set to true, FullyConnected will use only AB format for weights, thus MXNet won't use BRGEMM implementation of FC on machines with AVX512-VNNI support.

Review comment:
       it will change completely the meaning of the sentence




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r694551661



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&

Review comment:
       Do the perf results are repeated? The perf result for all dims is really close, I guess that perf comes from ::ab format, not it comes from its internal implementation [?] 
   
   So, what would happen if the cache mechanism is enabled? (how do the differences look like: results are close, are completely different or are the same) [?]
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] anko-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

anko-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r691227672



##########
File path: src/operator/nn/mkldnn/mkldnn_fully_connected.cc
##########
@@ -42,9 +42,10 @@ mkldnn::inner_product_forward::primitive_desc GetFCFwdImpl(const MKLDNNFCFullPar
                                                            const NDArray* bias,
                                                            const mkldnn::memory::desc& out_md) {
   auto data_md   = GetMemDesc(data);
-  auto weight_md = full_param.mkldnn_param.quantized ? GetFCWeightDesc(weight, mshadow::kInt8)
-                                                     : GetFCWeightDesc(weight);
-  auto engine    = CpuEngine::Get()->get_engine();
+  auto weight_md = full_param.mkldnn_param.quantized
+                       ? GetFCWeightDesc(weight, data.shape()[0], mshadow::kInt8)
+                       : GetFCWeightDesc(weight, data.shape()[0]);
+  auto engine = CpuEngine::Get()->get_engine();

Review comment:
       This line could stay aligned on "="

##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,26 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool SupportBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;
+}
+
+inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr,
+                                                   size_t batch_size,
+                                                   int dtype = -1) {
   int ndim = arr.shape().ndim();
   mkldnn::memory::dims dims(ndim);
   dtype = (dtype == -1) ? arr.dtype() : dtype;
   for (size_t i = 0; i < dims.size(); i++)
     dims[i] = arr.shape()[i];
   auto format = mkldnn::memory::format_tag::any;
   // for batch 256 alexnet benchmark test
-  const bool brgemm_disabled = dmlc::GetEnv("MXNET_MKLDNN_DISABLE_BRGEMM_FC", true);
-  if (dims.size() == 2 && brgemm_disabled) {
-    format = mkldnn::memory::format_tag::ab;
+  const bool force_fc_ab_format = dmlc::GetEnv("MXNET_MKLDNN_FORCE_FC_AB_FORMAT", false);

Review comment:
       please update  docs/static_site/src/pages/api/faq/env_var.md

##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,26 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool SupportBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {

Review comment:
       ```suggestion
   inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
   ```
   and maybe some comments that this function takes brgemm implementation when it it is faster than ...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693973292



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {

Review comment:
       done

##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693963273



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;
+}
+
+inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr,
+                                                   size_t batch_size,
+                                                   int dtype = -1) {

Review comment:
       -1 means that dtype for mkldnn descriptor should be taken from NDArray 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] anko-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

anko-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r692030929



##########
File path: src/operator/nn/mkldnn/mkldnn_fully_connected.cc
##########
@@ -42,9 +42,10 @@ mkldnn::inner_product_forward::primitive_desc GetFCFwdImpl(const MKLDNNFCFullPar
                                                            const NDArray* bias,
                                                            const mkldnn::memory::desc& out_md) {
   auto data_md   = GetMemDesc(data);
-  auto weight_md = full_param.mkldnn_param.quantized ? GetFCWeightDesc(weight, mshadow::kInt8)
-                                                     : GetFCWeightDesc(weight);
-  auto engine    = CpuEngine::Get()->get_engine();
+  auto weight_md = full_param.mkldnn_param.quantized
+                       ? GetFCWeightDesc(weight, data.shape()[0], mshadow::kInt8)
+                       : GetFCWeightDesc(weight, data.shape()[0]);
+  auto engine = CpuEngine::Get()->get_engine();

Review comment:
       Please check if you have code formatted according to https://github.com/apache/incubator-mxnet/pull/20433/files




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r694551661



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&

Review comment:
       Do the perf results are repeated? The perf result for all dims is really close, I guess that perf comes from format settings, not it comes from its internal implementation [?] 
   
   So, what would happen if the cache mechanism is enabled? (how do the differences look like: results are close, are completely different or are the same) [?]
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] anko-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

anko-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r691227672



##########
File path: src/operator/nn/mkldnn/mkldnn_fully_connected.cc
##########
@@ -42,9 +42,10 @@ mkldnn::inner_product_forward::primitive_desc GetFCFwdImpl(const MKLDNNFCFullPar
                                                            const NDArray* bias,
                                                            const mkldnn::memory::desc& out_md) {
   auto data_md   = GetMemDesc(data);
-  auto weight_md = full_param.mkldnn_param.quantized ? GetFCWeightDesc(weight, mshadow::kInt8)
-                                                     : GetFCWeightDesc(weight);
-  auto engine    = CpuEngine::Get()->get_engine();
+  auto weight_md = full_param.mkldnn_param.quantized
+                       ? GetFCWeightDesc(weight, data.shape()[0], mshadow::kInt8)
+                       : GetFCWeightDesc(weight, data.shape()[0]);
+  auto engine = CpuEngine::Get()->get_engine();

Review comment:
       This line could stay aligned on "="

##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,26 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool SupportBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;
+}
+
+inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr,
+                                                   size_t batch_size,
+                                                   int dtype = -1) {
   int ndim = arr.shape().ndim();
   mkldnn::memory::dims dims(ndim);
   dtype = (dtype == -1) ? arr.dtype() : dtype;
   for (size_t i = 0; i < dims.size(); i++)
     dims[i] = arr.shape()[i];
   auto format = mkldnn::memory::format_tag::any;
   // for batch 256 alexnet benchmark test
-  const bool brgemm_disabled = dmlc::GetEnv("MXNET_MKLDNN_DISABLE_BRGEMM_FC", true);
-  if (dims.size() == 2 && brgemm_disabled) {
-    format = mkldnn::memory::format_tag::ab;
+  const bool force_fc_ab_format = dmlc::GetEnv("MXNET_MKLDNN_FORCE_FC_AB_FORMAT", false);

Review comment:
       please update  docs/static_site/src/pages/api/faq/env_var.md

##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,26 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool SupportBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {

Review comment:
       ```suggestion
   inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
   ```
   and maybe some comments that this function takes brgemm implementation when it it is faster than ...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r694668740



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&

Review comment:
       Running 100 iterations of benchmark which allows to utilize caching mechanism shows that BRGEMM is faster in majority of cases (1043/1472) - when running only single iteration with specific shape it's only (423/1472)
   Below are results for current conditions (left 100 iterations / right 1 iteration) and also I'm attaching spreadsheet
   [brgemm_igemm_cmp_2.xlsx](https://github.com/apache/incubator-mxnet/files/7038007/brgemm_igemm_cmp_2.xlsx)
   
   ![image](https://user-images.githubusercontent.com/59644968/130590180-abbe0dae-d45b-4ec5-9f4d-f6f33a33c6b5.png)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693973623



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r694551661



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&

Review comment:
       Do the perf results are repeated? The perf result for all dims are really close, I guess that perf comes from format settings, not it comes from its internal implementation [?] 
   
   So, what would happen if the cache mechanism is enabled? (how do the differences look like: results are close, are completely different or are the same) [?]
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] akarbown merged pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

akarbown merged pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693908115



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&

Review comment:
       Those comprehensive tests have shown us a lot of things. Could you please tell me what the criteria of acceptance are being used here? For brevity: how the boundaries are determined?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693914049



##########
File path: docs/static_site/src/pages/api/faq/env_var.md
##########
@@ -298,9 +298,9 @@ If ctypes is used, it must be `mxnet._ctypes.ndarray.NDArrayBase`.
   - Values: Int ```(default=-1)```
   - Flag to set num of elements that MKLDNN cache can hold. Default is -1 which means cache size is unbounded. Should only be set if your model has variable input shapes, as cache size may grow unbounded. The number represents the number of items in the cache and is proportional to the number of layers that use MKLDNN and different input shape.
 
-* MXNET_MKLDNN_DISABLE_BRGEMM_FC
-  - Values: 0, 1 ```(default=1)```
-  - Flag which disables BRGEMM kernels in FullyConnected executed with MKLDNN support - Should only be set to 0 if your model has constant input shapes or FullyConnected is calculated with large tensors. Supported on machines with AVX512-VNNI.
+* MXNET_MKLDNN_FORCE_FC_AB_FORMAT
+  - Values: 0, 1 ```(default=0)```
+  - If set to true, FullyConnected will use only AB format for weights, thus MXNet won't use BRGEMM implementation of FC on machines with AVX512-VNNI support.

Review comment:
       thus -> Otherwise [?]
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#issuecomment-901100592


   @szha, @TaoLv Can you review? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r694168196



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims& weight_dims, size_t batch_size) {

Review comment:
       const reference




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#issuecomment-900291528


   Hey @bgawrych , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [miscellaneous, sanity, centos-cpu, unix-gpu, clang, edge, centos-gpu, windows-cpu, website, windows-gpu, unix-cpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r694002398



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;

Review comment:
       > partly disagree: it is simple notation to show that particular bit is set or we used power of 2 value. But in this particular example 8192 is also good representation.
   
   @anko-intel This is not exactly true :) - this is x^2^k = x ^ (bmbm-1,....,b0)_2 :) 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] anko-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

anko-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r692729596



##########
File path: src/operator/nn/mkldnn/mkldnn_fully_connected.cc
##########
@@ -42,9 +42,10 @@ mkldnn::inner_product_forward::primitive_desc GetFCFwdImpl(const MKLDNNFCFullPar
                                                            const NDArray* bias,
                                                            const mkldnn::memory::desc& out_md) {
   auto data_md   = GetMemDesc(data);
-  auto weight_md = full_param.mkldnn_param.quantized ? GetFCWeightDesc(weight, mshadow::kInt8)
-                                                     : GetFCWeightDesc(weight);
-  auto engine    = CpuEngine::Get()->get_engine();
+  auto weight_md = full_param.mkldnn_param.quantized
+                       ? GetFCWeightDesc(weight, data.shape()[0], mshadow::kInt8)
+                       : GetFCWeightDesc(weight, data.shape()[0]);
+  auto engine = CpuEngine::Get()->get_engine();

Review comment:
       Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r692685479



##########
File path: src/operator/nn/mkldnn/mkldnn_fully_connected.cc
##########
@@ -42,9 +42,10 @@ mkldnn::inner_product_forward::primitive_desc GetFCFwdImpl(const MKLDNNFCFullPar
                                                            const NDArray* bias,
                                                            const mkldnn::memory::desc& out_md) {
   auto data_md   = GetMemDesc(data);
-  auto weight_md = full_param.mkldnn_param.quantized ? GetFCWeightDesc(weight, mshadow::kInt8)
-                                                     : GetFCWeightDesc(weight);
-  auto engine    = CpuEngine::Get()->get_engine();
+  auto weight_md = full_param.mkldnn_param.quantized
+                       ? GetFCWeightDesc(weight, data.shape()[0], mshadow::kInt8)
+                       : GetFCWeightDesc(weight, data.shape()[0]);
+  auto engine = CpuEngine::Get()->get_engine();

Review comment:
       It only align variables that are one below another without additional lines between. I moved engine to the top of the function and it's aligned now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r692681460



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurment results done on CLX8280

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693905361



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;

Review comment:
       2<<13: What are the advantages of using this specific notation? 2<<13 - this mathematical notation stay so hard to read.
   
   

##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&

Review comment:
       There's better to check weight_dims[0] >= 1024 firstly to avoid '%".

##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {

Review comment:
       How about passing weight_dims by reference? It looks like that dims is a vector.

##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&

Review comment:
       Those comprehensive test have shown us a lot of things. Could you please tell me what the criteria of acceptance are being used here? For brevity: how the boundaries are determined?

##########
File path: docs/static_site/src/pages/api/faq/env_var.md
##########
@@ -298,9 +298,9 @@ If ctypes is used, it must be `mxnet._ctypes.ndarray.NDArrayBase`.
   - Values: Int ```(default=-1)```
   - Flag to set num of elements that MKLDNN cache can hold. Default is -1 which means cache size is unbounded. Should only be set if your model has variable input shapes, as cache size may grow unbounded. The number represents the number of items in the cache and is proportional to the number of layers that use MKLDNN and different input shape.
 
-* MXNET_MKLDNN_DISABLE_BRGEMM_FC
-  - Values: 0, 1 ```(default=1)```
-  - Flag which disables BRGEMM kernels in FullyConnected executed with MKLDNN support - Should only be set to 0 if your model has constant input shapes or FullyConnected is calculated with large tensors. Supported on machines with AVX512-VNNI.
+* MXNET_MKLDNN_FORCE_FC_AB_FORMAT
+  - Values: 0, 1 ```(default=0)```
+  - If set to true, FullyConnected will use only AB format for weights, thus MXNet won't use BRGEMM implementation of FC on machines with AVX512-VNNI support.

Review comment:
       thus -> Otherwise
   

##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;
+}
+
+inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr,
+                                                   size_t batch_size,
+                                                   int dtype = -1) {

Review comment:
       Whats does the value (-1) mean? 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] anko-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

anko-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693940147



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;

Review comment:
       partly disagree: it is simple notation to show that particular bit is set or we used power of 2 value. But in this particular example 8192 is also good representation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693941982



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  // Conditions based on measurement results done on CLX8280
+  // https://github.com/apache/incubator-mxnet/pull/20533
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;

Review comment:
       @anko-intel it's 16384 :D - i will change it to normal number




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r693903136



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,28 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool ChooseBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {

Review comment:
       How about passing the weight_dims by reference? It looks like that dims is a vector.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #20533: [v1.x] Enabling BRGEMM FullyConnected based on shapes

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #20533:
URL: https://github.com/apache/incubator-mxnet/pull/20533#discussion_r691824518



##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,26 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool SupportBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {
+  return weight_dims[0] % 64 == 0 && weight_dims[1] % 64 == 0 && weight_dims[0] >= 1024 &&
+         weight_dims[1] >= 1024 && batch_size >= 2 << 13;
+}
+
+inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr,
+                                                   size_t batch_size,
+                                                   int dtype = -1) {
   int ndim = arr.shape().ndim();
   mkldnn::memory::dims dims(ndim);
   dtype = (dtype == -1) ? arr.dtype() : dtype;
   for (size_t i = 0; i < dims.size(); i++)
     dims[i] = arr.shape()[i];
   auto format = mkldnn::memory::format_tag::any;
   // for batch 256 alexnet benchmark test
-  const bool brgemm_disabled = dmlc::GetEnv("MXNET_MKLDNN_DISABLE_BRGEMM_FC", true);
-  if (dims.size() == 2 && brgemm_disabled) {
-    format = mkldnn::memory::format_tag::ab;
+  const bool force_fc_ab_format = dmlc::GetEnv("MXNET_MKLDNN_FORCE_FC_AB_FORMAT", false);

Review comment:
       done

##########
File path: src/operator/nn/mkldnn/mkldnn_fully_connected.cc
##########
@@ -42,9 +42,10 @@ mkldnn::inner_product_forward::primitive_desc GetFCFwdImpl(const MKLDNNFCFullPar
                                                            const NDArray* bias,
                                                            const mkldnn::memory::desc& out_md) {
   auto data_md   = GetMemDesc(data);
-  auto weight_md = full_param.mkldnn_param.quantized ? GetFCWeightDesc(weight, mshadow::kInt8)
-                                                     : GetFCWeightDesc(weight);
-  auto engine    = CpuEngine::Get()->get_engine();
+  auto weight_md = full_param.mkldnn_param.quantized
+                       ? GetFCWeightDesc(weight, data.shape()[0], mshadow::kInt8)
+                       : GetFCWeightDesc(weight, data.shape()[0]);
+  auto engine = CpuEngine::Get()->get_engine();

Review comment:
       This is how clang-formatter works, should I ignore it and align manually?

##########
File path: src/operator/nn/mkldnn/mkldnn_base-inl.h
##########
@@ -305,17 +305,26 @@ inline static mkldnn::memory::desc GetMemDesc(const NDArray& arr, int dtype = -1
   return mkldnn::memory::desc{dims, get_mkldnn_type(dtype), mkldnn::memory::format_tag::any};
 }
 
-inline static mkldnn::memory::desc GetFCWeightDesc(const NDArray& arr, int dtype = -1) {
+inline static bool SupportBRGEMMImpl(mkldnn::memory::dims weight_dims, size_t batch_size) {

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org