You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/01/11 00:35:46 UTC

[GitHub] [incubator-mxnet] rondogency opened a new pull request #17270: [WIP] Dynamic custom operator GPU support

rondogency opened a new pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270
 
 
   ## Description ##
   Add custom operator GPU support to enable users to write custom operator running on GPU. 
   This is a continuation of custom operator project https://github.com/apache/incubator-mxnet/pull/15921
   
   ## Design ##
   The main constraint is to make operator CUDA code and custom operator registration being compiled by NVCC together.
   
   Working backward from the user, user will create a single .cu file, register a single operator to contain both CPU and GPU computation logic. The registration is the same as CPU operators.
   `REGISTER_OP(my_relu)`
   
   User should dispatch kernel function by checking MXTensor context in custom operator forward/backward function. Here we use a simple relu example:
   ` if (inputs[0].ctx.dev_type == MX_GPU){
           cudaStream_t gpu_stream = reinterpret_cast<cudaStream_t>(res.get_gpu_stream());
           int64_t N = inputs[0].size();
           int grid = (N + 255) / 256;
           int block = 256;
           relu_gpu_forward<<<grid,block,0,gpu_stream>>>(out_data, in_data, N);
     } else {
           relu_cpu_forward(out_data, in_data, inputs[0].size());
     }`
   
   Then user should write CUDA code snippet in that file for GPU kernel function. All computation here will be run in GPU.
   `__global__ void relu_gpu_forward(float *out, float *in, int64_t N) {
       int tid = blockIdx.x * blockDim.x + threadIdx.x;
       if (tid < N){
           out[tid] = in[tid] > 0 ? in[tid] : 0;
       }
   }`
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
   - [ ] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments are documented. 
   - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
   - Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
   - [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - Add Fcompute<gpu> registration, and pass NDArray context to custom library in c_api.cc
   - Add context info to MXTensor class in lib_api.h
   - Add lib_custom_op/relu.cu example file containing full registration of custom operator "my_relu", and add both CPU and GPU kernel functions in that file
   - Modify lib_custom_op/Makefile to compile .cu file using nvcc to custom library
   
   ## Comments ##

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367730787
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -450,32 +480,59 @@ int MXLoadLib(const char *path) {
         return ptr;
       };
 
+      // pass the gpu stream associated with the context to custom library
+      void* gpu_stream = nullptr;
+      if (inputs[i].ctx().dev_mask() == Context::kGPU) {
+        mshadow::Stream<mxnet::gpu> *s = ctx.get_stream<mxnet::gpu>();
+        gpu_stream = static_cast<void*>(mshadow::Stream<gpu>::GetStream(s));
+      }
+
       // call fcompute function
       CHECK(callFComp(fcomp_fp, attr_keys.data(), attr_vals.data(), attr_keys.size(),
-                      in_shapes.data(), in_dims.data(), in_data.data(),
-                      in_types.data(), in_verIDs.data(), in_data.size(),
-                      out_shapes.data(), out_dims.data(), out_data.data(),
-                      out_types.data(), out_verIDs.data(), out_data.size(),
-                      cpu_malloc, &cpu_alloc))
+                      in_shapes.data(), in_dims.data(), in_data.data(), in_types.data(),
+                      in_verIDs.data(), in_dev_type.data(), in_dev_id.data(), in_data.size(),
+                      out_shapes.data(), out_dims.data(), out_data.data(), out_types.data(),
+                      out_verIDs.data(), out_dev_type.data(), out_dev_id.data(), out_data.size(),
+                      cpu_malloc, &cpu_alloc, gpu_stream))
       << "Error calling FCompute for custom operator '" << name_str << "'";
 
       // return type void
     };
 
-    auto forward_lambda = [=](const nnvm::NodeAttrs& attrs,
-                              const OpContext& ctx,
-                              const std::vector<NDArray>& inputs,
-                              const std::vector<OpReqType>& req,
-                              const std::vector<NDArray>& outputs) {
-      return fcomp_lambda(fcomp_fp, attrs, ctx, inputs, req, outputs);
+    auto forward_cpu_lambda = [=](const nnvm::NodeAttrs& attrs,
+                                  const OpContext& ctx,
+                                  const std::vector<NDArray>& inputs,
+                                  const std::vector<OpReqType>& req,
+                                  const std::vector<NDArray>& outputs) {
+      CHECK(forward_ctx_map.count("cpu") > 0) << "CPU Forward function is not implemented";
+      return fcomp_lambda(forward_ctx_map.at("cpu"), attrs, ctx, inputs, req, outputs);
 
 Review comment:
   agree

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369388560
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -563,101 +623,30 @@ int MXLoadLib(const char *path) {
       }
 
       // create a pointer to hold custom op state object
+      // only create one stateful op depending on passing context
+      // user can add new supported context and call to custom library
       void* state_op_inst = nullptr;
-      CHECK(callCreateOpState(create_opstate_fp, attr_keys.data(), attr_vals.data(),
-                              attr_keys.size(), &state_op_inst))
-      << "Error calling CreateOpState for custom operator '" << name_str << "'";
-
+      if (ctx.dev_mask() == Context::kCPU) {
+        CHECK(createop_map.count("cpu") > 0)
+        << "CPU CreateOpState not implemented for '" << name_str << "'";
+        CHECK(callCreateOpState(createop_map.at("cpu"), attr_keys.data(), attr_vals.data(),
+                                attr_keys.size(), &state_op_inst))
+        << "Error calling CreateOpState CPU for custom operator '" << name_str << "'";
+      } else if (ctx.dev_mask() == Context::kGPU) {
+        CHECK(createop_map.count("gpu") > 0)
+        << "GPU CreateOpState not implemented for '" << name_str << "'";
+        CHECK(callCreateOpState(createop_map.at("gpu"), attr_keys.data(), attr_vals.data(),
+                                attr_keys.size(), &state_op_inst))
+        << "Error calling CreateOpState GPU for custom operator '" << name_str << "'";
+      }
       CHECK(state_op_inst != nullptr)
       << "Error custom library failed to create stateful operator '" << name_str << "'";
 
       CustomStatefulOp* state_op = reinterpret_cast<CustomStatefulOp*>(state_op_inst);
       return OpStatePtr::Create<CustomStatefulOpWrapper>(state_op);
     };
 
-    // stateful forward and backward
-    auto fstateful_lambda = [=](bool is_forward,
-                                const OpStatePtr& state_ptr,
-                                const OpContext& ctx,
-                                const std::vector<NDArray>& inputs,
-                                const std::vector<OpReqType>& req,
-                                const std::vector<NDArray>& outputs) {
-      std::vector<void*> in_data, out_data;
-      std::vector<const int64_t *> in_shapes, out_shapes;
-      std::vector<int> in_dims, out_dims;
-      std::vector<int> in_types, out_types;
-      std::vector<size_t> in_verIDs, out_verIDs;
-
-      // convert input tensors to constituent parts
-      for (size_t i = 0; i < inputs.size(); i++) {
-        in_data.push_back(inputs[i].data().dptr_);
-        in_shapes.push_back(inputs[i].shape().data());
-        in_dims.push_back(inputs[i].shape().ndim());
-        in_types.push_back(inputs[i].dtype());
-        in_verIDs.push_back(inputs[i].version());
-      }
-
-      // convert output tensors to constituent parts
-      for (size_t i = 0; i < outputs.size(); i++) {
-        out_data.push_back(outputs[i].data().dptr_);
-        out_shapes.push_back(outputs[i].shape().data());
-        out_dims.push_back(outputs[i].shape().ndim());
-        out_types.push_back(outputs[i].dtype());
-        out_verIDs.push_back(outputs[i].version());
-      }
-
-      // get memory resource
-      const Resource &resource = ctx.requested[0];
-      mshadow::Stream<mxnet::cpu> *cpu_stream = ctx.get_stream<mxnet::cpu>();
-
-      // create lambda that captures stream & resource objects
-      // this temp workspace holds memory allocated by custom library via OpResource
-      auto cpu_alloc = [&](int size) {
-        mshadow::Tensor<mxnet::cpu, 1, char> data =
-        resource.get_space_typed<mxnet::cpu, 1, char>(mshadow::Shape1(size), cpu_stream);
-        return data.dptr_;
-      };
-
-      // create lambda without captures so that we can cast it to function pointer
-      // this needs to be a lambda function so that we can do the decltype cast
-      typedef decltype(cpu_alloc) alloc_type;
-      auto cpu_malloc = [](void* _cpu_alloc, int size) {
-        // cast the void* argument to the type for the cpu_alloc lambda function
-        alloc_type* cpualloc = static_cast<alloc_type*>(_cpu_alloc);
-        // call cpu_alloc to actually allocate memory and get the pointer
-        void* ptr = (*cpualloc)(size);
-        return ptr;
-      };
-
-      // retrieve op state object created from CreateOpState
-      CustomStatefulOpWrapper& op = state_ptr.get_state<CustomStatefulOpWrapper>();
-      CustomStatefulOp* state_op_inst = op.get_instance();
-      CHECK(state_op_inst != nullptr)
-      << "Error MXNet cannot load custom stateful operator'" << name_str << "'";
-
-      // call fcompute function
-      CHECK(callFStatefulComp(is_forward, state_op_inst, in_shapes.data(), in_dims.data(),
-                              in_data.data(), in_types.data(), in_verIDs.data(), in_data.size(),
-                              out_shapes.data(), out_dims.data(), out_data.data(), out_types.data(),
-                              out_verIDs.data(), out_data.size(), cpu_malloc, &cpu_alloc))
-      << "Error calling FStatefulCompute for custom operator '" << name_str << "'";
-    };
-
-    auto fstateful_forward = [=](const OpStatePtr& state_ptr,
-                                 const OpContext& ctx,
-                                 const std::vector<NDArray>& inputs,
-                                 const std::vector<OpReqType>& req,
-                                 const std::vector<NDArray>& outputs) {
-      fstateful_lambda(true, state_ptr, ctx, inputs, req, outputs);
-    };
-
-    auto fstateful_backward = [=](const OpStatePtr& state_ptr,
-                                  const OpContext& ctx,
-                                  const std::vector<NDArray>& inputs,
-                                  const std::vector<OpReqType>& req,
-                                  const std::vector<NDArray>& outputs) {
-      fstateful_lambda(false, state_ptr, ctx, inputs, req, outputs);
-    };
+    /* -------------- BELOW ARE CUSTOM OPERATOR REGISTRATION --------------- */
 
 Review comment:
   I think you mean "BELOW IS THE REGISTRATION FOR CUSTOM OPERATORS"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372637643
 
 

 ##########
 File path: example/extensions/lib_custom_op/test_relu.py
 ##########
 @@ -0,0 +1,83 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=arguments-differ
+
+# This test checks dynamic loading of custom library into MXNet
+# and checks end to end compute of a simple 2D gemm custom op
+
+import mxnet as mx
+import os
+import time
+
+#load library
+if (os.name=='posix'):
+    path = os.path.abspath('librelu_lib.so')
+    mx.library.load(path)
+
+a = mx.nd.array([[-2,-1],[1,2]], ctx=mx.cpu())
+b = mx.nd.array([[-2,-1],[1,2]], ctx=mx.gpu())
+
+print("--------start ndarray compute---------")
+print(mx.nd.my_relu(a))
+print(mx.nd.my_relu(b))
+print(mx.nd.my_state_relu(a))
+print(mx.nd.my_state_relu(b))
+
+print("--------start symbolic compute--------")
+c = mx.sym.Variable('c')
+d = mx.sym.Variable('d')
+e = mx.sym.my_relu(c)
+base = mx.sym.relu(d)
+in_grad = [mx.nd.empty((2,2), ctx=mx.gpu())]
+in_grad_base = [mx.nd.empty((2,2), ctx=mx.gpu())]
+exe = e.bind(ctx=mx.gpu(), args={'c':b}, args_grad=in_grad)
+exe_base = base.bind(ctx=mx.gpu(), args={'d':b}, args_grad=in_grad_base)
+out = exe.forward()
+out_base = exe_base.forward()
+print(out)
+print(out_base)
+
+print("--------start backward compute--------")
+out_grad = mx.nd.ones((2,2), ctx=mx.gpu())
+exe.backward([out_grad])
+exe_base.backward([out_grad])
+print(in_grad)
+print(in_grad_base)
+
+print("--------start testing larger ndarray---------")
+a = mx.nd.uniform(shape=(100,100,100), ctx=mx.cpu())
+b = mx.nd.uniform(shape=(100,100,100), ctx=mx.gpu())
 
 Review comment:
   to not include the memory allocation in the performance timing?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369335083
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -164,39 +186,60 @@ int MXLoadLib(const char *path) {
   for (int i = 0; i < numOps; i++) {
     const char* name;
     // function pointers holding implementation from custom library
-    fcomp_t fcomp_fp = nullptr;
     parseAttrs_t parse_fp = nullptr;
     inferType_t type_fp = nullptr;
     inferShape_t shape_fp = nullptr;
     // optional attributes
-    fcomp_t fgrad_fp = nullptr;
     mutateInputs_t mutate_fp = nullptr;
-    createOpState_t create_opstate_fp = nullptr;
     bool isSubgraphOp = false;
     int _isSubgraphOp = 0;
-
-    // get custom operator implemenation from the dynamic library
-    opRegGet(i, &name, &fcomp_fp, &fgrad_fp, &parse_fp, &type_fp, &shape_fp,
-             &mutate_fp, &create_opstate_fp, &_isSubgraphOp);
+    // lists of forward and backward function associated with each context
+    const char **forward_ctx, **backward_ctx, **createop_ctx;
+    fcomp_t *forward_fcomp, *backward_fcomp;
+    createOpState_t *createop_fp;
+    int forward_count, backward_count, createop_count;
+
+    // main function to get custom operator implemenation from the custom library
+    opRegGet(i, &name, &_isSubgraphOp,
+             &forward_ctx, &forward_fcomp, &forward_count,
+             &backward_ctx, &backward_fcomp, &backward_count,
+             &createop_ctx, &createop_fp, &createop_count,
+             &parse_fp, &type_fp, &shape_fp, &mutate_fp);
+
+    // construct maps of context to forward/backward custom library function
+    std::unordered_map<std::string, fcomp_t> forward_ctx_map;
+    std::unordered_map<std::string, fcomp_t> backward_ctx_map;
+    std::unordered_map<std::string, createOpState_t> createop_map;
+    for (int i=0; i<forward_count; i++) {
+      std::string ctx_str(forward_ctx[i]);
+      forward_ctx_map[ctx_str] = forward_fcomp[i];
 
 Review comment:
   I did the context dedup check in lib_api.h to catch the error early

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r371045630
 
 

 ##########
 File path: example/extensions/lib_custom_op/relu_lib.cu
 ##########
 @@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * Copyright (c) 2020 by Contributors
+ * \file relu_lib.cu
+ * \brief simple custom relu operator implemented using CUDA function
+ */
+
+#include <iostream>
+#include "lib_api.h"
+
+__global__ void relu_gpu_forward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? in[tid] : 0;
+}
+
+__global__ void relu_gpu_backward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? 1 : 0;
 
 Review comment:
   thanks for pointing out the error. I have made the fix

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369386749
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -99,7 +99,135 @@ inline int MXAPIGetFunctionRegInfo(const FunRegType *e,
 // NOTE: return value is added in API_END
 
 /*!
- * \brief Loads dynamic library and initializes it
+ * \brief Common compute function dispatcher for forward/backward and stateful forward/backward
+ * state_ptr will be nullptr for regular ops; fcomp_fp is nullptr for stateful ops
+ */
+void CustomFComputeDispatcher(const std::string op_name,
+                              const opCallFComp_t callFComp,
+                              const fcomp_t fcomp_fp,
+                              const nnvm::NodeAttrs* attrs,
+                              const opCallFStatefulComp_t callFStatefulComp,
+                              int stateful_forward_flag,
+                              const OpStatePtr* state_ptr,
+                              const OpContext& ctx,
+                              const std::vector<NDArray>& inputs,
+                              const std::vector<OpReqType>& req,
+                              const std::vector<NDArray>& outputs) {
+  std::vector<void*> in_data, out_data;
+  std::vector<const int64_t *> in_shapes, out_shapes;
+  std::vector<int> in_dims, out_dims;
+  std::vector<int> in_types, out_types;
+  std::vector<size_t> in_verIDs, out_verIDs;
+  std::vector<const char*> in_dev_type, out_dev_type;
+  std::vector<int> in_dev_id, out_dev_id;
+
+  // convert inputs/outpus NDArray to C types to be passed to lib_api.h
+  for (size_t i = 0; i < inputs.size(); i++) {
+    in_data.push_back(inputs[i].data().dptr_);
+    in_shapes.push_back(inputs[i].shape().data());
+    in_dims.push_back(inputs[i].shape().ndim());
+    in_types.push_back(inputs[i].dtype());
+    in_verIDs.push_back(inputs[i].version());
+    const char* ctx_str = inputs[i].ctx().dev_mask() == Context::kCPU ? "cpu" : "gpu";
+    in_dev_type.push_back(ctx_str);
+    in_dev_id.push_back(inputs[i].ctx().real_dev_id());
+  }
+
+  for (size_t i = 0; i < outputs.size(); i++) {
+    out_data.push_back(outputs[i].data().dptr_);
+    out_shapes.push_back(outputs[i].shape().data());
+    out_dims.push_back(outputs[i].shape().ndim());
+    out_types.push_back(outputs[i].dtype());
+    out_verIDs.push_back(outputs[i].version());
+    const char* ctx_str = outputs[i].ctx().dev_mask() == Context::kCPU ? "cpu" : "gpu";
+    out_dev_type.push_back(ctx_str);
+    out_dev_id.push_back(outputs[i].ctx().real_dev_id());
+  }
+
+  // get memory resource and mxnet backend streams
+  const Resource &resource = ctx.requested[0];
+  mshadow::Stream<mxnet::cpu> *cpu_stream = ctx.get_stream<mxnet::cpu>();
+  mshadow::Stream<mxnet::gpu> *gpu_stream = ctx.get_stream<mxnet::gpu>();
+
+  // create lambda that captures stream & resource objects
+  // this temp workspace holds memory allocated by custom library via OpResource
+  auto cpu_alloc = [&](int size) {
+    mshadow::Tensor<mxnet::cpu, 1, char> workspace =
+      resource.get_space_typed<mxnet::cpu, 1, char>(mshadow::Shape1(size), cpu_stream);
+    return workspace.dptr_;
+  };
+  auto gpu_alloc = [&](int size) {
+    mshadow::Tensor<mxnet::gpu, 1, char> workspace =
+      resource.get_space_typed<mxnet::gpu, 1, char>(mshadow::Shape1(size), gpu_stream);
+    return workspace.dptr_;
+  };
+
+  // create lambda without captures so that we can cast it to function pointer
+  // this needs to be a lambda function so that we can do the decltype cast
+  typedef decltype(cpu_alloc) alloc_type_cpu;
+  auto cpu_malloc = [](void* _cpu_alloc, int size) {
+    // cast the void* argument to the type for the cpu_alloc lambda function
+    alloc_type_cpu* cpualloc = static_cast<alloc_type_cpu*>(_cpu_alloc);
+    // call cpu_alloc to actually allocate memory and get the pointer
+    void* ptr = (*cpualloc)(size);
 
 Review comment:
   is there some reason we're doing this in two lines instead of just `return (*cpualloc)(size)`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367696078
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -215,25 +225,43 @@ struct MXTensor {
   MXTensor() : data_ptr(NULL), dtype(kUNSET), verID(0) {}
 
   MXTensor(void *data_ptr, const std::vector<int64_t> &shape, MXDType dtype,
-           size_t vID)
-  : data_ptr(data_ptr), shape(shape), dtype(dtype), verID(vID) {}
+           size_t vID, MXContext mx_ctx)
+  : data_ptr(data_ptr), shape(shape), dtype(dtype), verID(vID), ctx(mx_ctx) {}
 
   /*! \brief populate internal tensor fields */
-  void setTensor(void *dptr, MXDType type, const int64_t* dims,
-                 int ndims, size_t vID) {
-    data_ptr = dptr; dtype = type; verID = vID;
+  void setTensor(void *dptr, MXDType type, const int64_t* dims, int ndims,
+                 size_t vID, MXContext mx_ctx) {
+    data_ptr = dptr; dtype = type; verID = vID; ctx = mx_ctx;
     shape.clear();
     for (int j = 0; j < ndims; j++) {
       shape.push_back(dims[j]);
     }
-    setDLTensor();
+    DLDeviceType dltype;
+    if (ctx.dev_type == "cpu")
+      dltype = kDLCPU;
+    else if (ctx.dev_type == "gpu")
+      dltype = kDLGPU;
+    else if (ctx.dev_type == "opencl")
+      dltype = kDLOpenCL;
+    else if (ctx.dev_type == "vulcan")
+      dltype = kDLVulkan;
+    else if (ctx.dev_type == "metal")
+      dltype = kDLMetal;
+    else if (ctx.dev_type == "vpi")
+      dltype = kDLVPI;
+    else if (ctx.dev_type == "rocm")
+      dltype = kDLROCM;
+    else
+      dltype = kDLExtDev;
 
 Review comment:
   should this be else or "ext"?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369384340
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -328,16 +358,31 @@ typedef void* (*xpu_malloc_t)(void*, int);
  */
 class OpResource {
  public:
-  OpResource(xpu_malloc_t cm, void* ca) : cpu_malloc(cm), cpu_alloc(ca) {}
+  OpResource(xpu_malloc_t cm, void* ca, xpu_malloc_t gm, void* ga, void* st)
+    : cpu_malloc(cm), gpu_malloc(gm), cpu_alloc(ca), gpu_alloc(ga), cuda_stream(st) {}
 
   /*! \brief allocate memory controlled by MXNet */
-  void* alloc(int size) {
+  void* alloc_cpu(int size) {
     return cpu_malloc(cpu_alloc, size);
   }
 
+  /*! \brief allocate memory controlled by MXNet */
+  void* alloc_gpu(int size) {
+    return gpu_malloc(gpu_alloc, size);
+  }
+
+  /*! \brief return the gpu stream object */
+  void* get_cuda_stream() {
 
 Review comment:
   As mentioned above, consider doing this instead:
   ```
   #if defined(__NVCC__)
    cudaStream_t
   #else
    void*
   #endif
     get_cuda_stream() {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367731019
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -685,28 +757,26 @@ int MXLoadLib(const char *path) {
       using namespace mxnet::op;
       regOp.set_num_inputs(DefaultSubgraphOpNumInputs);
       regOp.set_num_outputs(DefaultSubgraphOpNumOutputs);
-      regOp.set_attr<nnvm::FInferType>("FInferType",
-                                       DefaultSubgraphOpType, plevel);
-      regOp.set_attr<mxnet::FInferShape>("FInferShape",
-                                         DefaultSubgraphOpShape, plevel);
-      regOp.set_attr<FInferStorageType>("FInferStorageType",
-                                        DefaultSubgraphOpStorageType, plevel);
-      regOp.set_attr<FResourceRequest>("FResourceRequest",
-                                       DefaultSubgraphOpResourceRequest, plevel);
-      regOp.set_attr<nnvm::FMutateInputs>("FMutateInputs",
-                                          DefaultSubgraphOpMutableInputs, plevel);
+      regOp.set_attr<nnvm::FInferType>("FInferType", DefaultSubgraphOpType, plevel);
+      regOp.set_attr<mxnet::FInferShape>("FInferShape", DefaultSubgraphOpShape, plevel);
+      regOp.set_attr<FInferStorageType>("FInferStorageType", DefaultSubgraphOpStorageType, plevel);
+      regOp.set_attr<FResourceRequest>("FResourceRequest", DefaultSubgraphOpResourceRequest, plevel);
+      regOp.set_attr<nnvm::FMutateInputs>("FMutateInputs", DefaultSubgraphOpMutableInputs, plevel);
     }
 
     // optionally add stateful forward
     if (create_opstate_fp != nullptr) {
       regOp.set_attr<FCreateOpState>("FCreateOpState", create_opstate, plevel);
       regOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<cpu>",
                                         fstateful_forward, plevel);
+      regOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<gpu>",
 
 Review comment:
   agree

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367730719
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -982,17 +1045,22 @@ extern "C" {
     // create a vector of tensors for inputs
     std::vector<MXTensor> inputs(num_in);
     for (int i = 0; i < num_in; i++) {
-      inputs[i].setTensor(indata[i], (MXDType)intypes[i], inshapes[i], indims[i], inIDs[i]);
+      std::string ctx_str(indev_type[i]);
+      MXContext inctx = {ctx_str, indev_id[i]};
+      inputs[i].setTensor(indata[i], (MXDType)intypes[i], inshapes[i], indims[i],
 
 Review comment:
   agree

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky edited a comment on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky edited a comment on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-578355722
 
 
   @rondogency can you try this:
   ```
   #if defined(_WIN32) && defined(_WIN64) && defined(__WINDOWS__)
     #define VISIBILITY  
   #else
     #define VISIBILITY  __attribute__ ((visibility ("hidden")))
   #endif
   
   template <class T>
   class Registry {
    public:
     /*!
      * \brief get singleton pointer to class
      * \returns pointer to class
      */
     static Registry* get() VISIBILITY {
       static Registry inst;
       return &inst;
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369384033
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -328,16 +358,31 @@ typedef void* (*xpu_malloc_t)(void*, int);
  */
 class OpResource {
  public:
-  OpResource(xpu_malloc_t cm, void* ca) : cpu_malloc(cm), cpu_alloc(ca) {}
+  OpResource(xpu_malloc_t cm, void* ca, xpu_malloc_t gm, void* ga, void* st)
+    : cpu_malloc(cm), gpu_malloc(gm), cpu_alloc(ca), gpu_alloc(ga), cuda_stream(st) {}
 
   /*! \brief allocate memory controlled by MXNet */
 
 Review comment:
   can we change the comment to "allocate cpu memory"?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-579012695
 
 
   @ptrendx thanks for your comments! I have resolved those comments, and I will appreciate if you could take another quick look and approve this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-578355722
 
 
   @rondogency can you try this:
   ```
   template <class T>
   class Registry {
    public:
     /*!
      * \brief get singleton pointer to class
      * \returns pointer to class
      */
     static Registry* get() {
       static Registry inst __attribute__ ((visibility ("hidden")));
       return &inst;
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r373270975
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -594,26 +657,58 @@ class CustomOp {
     mutate_inputs = func;
     return *this;
   }
-  CustomOp& setCreateOpState(createOpState_t func) {
-    create_opstate = func;
+  CustomOp& setCreateOpState(createOpState_t func, const char* ctx) {
+    if (create_op_ctx_map.count(ctx) > 0)
+      raiseDuplicateContextError();
+    create_op_ctx_map[ctx] = func;
     return *this;
   }
   CustomOp& setIsSubgraphOp() {
     isSGop = true;
     return *this;
   }
+  void mapToVector() {
+    for (auto kv : forward_ctx_map) {
+      forward_ctx_cstr.push_back(kv.first);
+      forward_fp.push_back(kv.second);
+    }
+    for (auto kv : backward_ctx_map) {
+      backward_ctx_cstr.push_back(kv.first);
+      backward_fp.push_back(kv.second);
+    }
+    for (auto kv : create_op_ctx_map) {
+      create_op_ctx_cstr.push_back(kv.first);
+      create_op_fp.push_back(kv.second);
+    }
+  }
+  ~CustomOp() {}
 
   /*! \brief operator name */
   const char* name;
+
   /*! \brief operator functions */
-  fcomp_t forward;
-  fcomp_t backward;
   parseAttrs_t parse_attrs;
   inferType_t infer_type;
   inferShape_t infer_shape;
   mutateInputs_t mutate_inputs;
-  createOpState_t create_opstate;
   bool isSGop;
+
+  /*! \brief vector repr of ctx map to be easily loaded from c_api */
+  std::vector<const char*> forward_ctx_cstr, backward_ctx_cstr, create_op_ctx_cstr;
+  std::vector<fcomp_t> forward_fp, backward_fp;
+  std::vector<createOpState_t> create_op_fp;
+
+ private:
+  void raiseDuplicateContextError() {
+    std::string op_name_str(name);
+    throw std::runtime_error(
+      "Error! Error! Cannot register multiple functions under same context for operator '"
+      + op_name_str + "'");
+  }
+
+  /*! \brief dedup context maps - static string ctx to custom function */
+  std::unordered_map<const char*, fcomp_t> forward_ctx_map, backward_ctx_map;
 
 Review comment:
   The key of `forward_ctx_map`, `backward_ctx_map` and `create_op_ctx_map` are the pointer value of a string, rather than the content. Although the pointers of the same content are the same in a library, the pointers are different in different libraries. Using `std::string` as the key may be better.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r368173736
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -566,16 +609,20 @@ typedef MXReturnValue (*createOpState_t)(std::map<std::string, std::string>,
 class CustomOp {
  public:
   explicit CustomOp(const char* op_name) : name(op_name),
-    forward(NULL), backward(NULL), parse_attrs(NULL), infer_type(NULL),
-    infer_shape(NULL), mutate_inputs(NULL), create_opstate(NULL),
-    isSGop(false) {}
-  ~CustomOp() {}
-  CustomOp& setForward(fcomp_t fcomp) {
-    forward = fcomp;
+    parse_attrs(NULL), infer_type(NULL), infer_shape(NULL), mutate_inputs(NULL),
+    create_opstate(NULL), isSGop(false) {}
+  CustomOp& setForward(fcomp_t fcomp, std::string ctx) {
+    char* cstr = new char[ctx.length()+1];
+    strncpy(cstr, ctx.c_str(), ctx.length()+1);
+    forward_ctx_cstr.push_back(cstr);
 
 Review comment:
   changed string in parameter to be const char *, as user gonna pass in a static string, in this way we can avoid a redundant copy

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367714197
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -450,32 +480,59 @@ int MXLoadLib(const char *path) {
         return ptr;
       };
 
+      // pass the gpu stream associated with the context to custom library
+      void* gpu_stream = nullptr;
+      if (inputs[i].ctx().dev_mask() == Context::kGPU) {
+        mshadow::Stream<mxnet::gpu> *s = ctx.get_stream<mxnet::gpu>();
+        gpu_stream = static_cast<void*>(mshadow::Stream<gpu>::GetStream(s));
+      }
+
       // call fcompute function
       CHECK(callFComp(fcomp_fp, attr_keys.data(), attr_vals.data(), attr_keys.size(),
-                      in_shapes.data(), in_dims.data(), in_data.data(),
-                      in_types.data(), in_verIDs.data(), in_data.size(),
-                      out_shapes.data(), out_dims.data(), out_data.data(),
-                      out_types.data(), out_verIDs.data(), out_data.size(),
-                      cpu_malloc, &cpu_alloc))
+                      in_shapes.data(), in_dims.data(), in_data.data(), in_types.data(),
+                      in_verIDs.data(), in_dev_type.data(), in_dev_id.data(), in_data.size(),
+                      out_shapes.data(), out_dims.data(), out_data.data(), out_types.data(),
+                      out_verIDs.data(), out_dev_type.data(), out_dev_id.data(), out_data.size(),
+                      cpu_malloc, &cpu_alloc, gpu_stream))
       << "Error calling FCompute for custom operator '" << name_str << "'";
 
       // return type void
     };
 
-    auto forward_lambda = [=](const nnvm::NodeAttrs& attrs,
-                              const OpContext& ctx,
-                              const std::vector<NDArray>& inputs,
-                              const std::vector<OpReqType>& req,
-                              const std::vector<NDArray>& outputs) {
-      return fcomp_lambda(fcomp_fp, attrs, ctx, inputs, req, outputs);
+    auto forward_cpu_lambda = [=](const nnvm::NodeAttrs& attrs,
+                                  const OpContext& ctx,
+                                  const std::vector<NDArray>& inputs,
+                                  const std::vector<OpReqType>& req,
+                                  const std::vector<NDArray>& outputs) {
+      CHECK(forward_ctx_map.count("cpu") > 0) << "CPU Forward function is not implemented";
+      return fcomp_lambda(forward_ctx_map.at("cpu"), attrs, ctx, inputs, req, outputs);
 
 Review comment:
   ```
   if(forward_ctx_map.count("cpu") > 0) {
       fcomp_t fcomp = forward_ctx_map.at("cpu");
       auto forward_cpu_lambda = [=](const nnvm::NodeAttrs& attrs,
                                     const OpContext& ctx,
                                     const std::vector<NDArray>& inputs,
                                     const std::vector<OpReqType>& req,
                                     const std::vector<NDArray>& outputs) {
         return fcomp_lambda(fcomp, attrs, ctx, inputs, req, outputs);
    } ;
   }

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369382573
 
 

 ##########
 File path: example/extensions/lib_custom_op/relu_lib.cu
 ##########
 @@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * Copyright (c) 2020 by Contributors
+ * \file relu_lib.cu
+ * \brief simple custom relu operator implemented using CUDA function
+ */
+
+#include <iostream>
+#include "lib_api.h"
+
+__global__ void relu_gpu_forward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? in[tid] : 0;
+}
+
+__global__ void relu_gpu_backward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? 1 : 0;
+}
+
+MXReturnValue forwardCPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? in_data[i] : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue backwardCPU(std::map<std::string, std::string> attrs,
+                          std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? 1 : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue forwardGPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+
+    // test on memory resource allocation
+    void *workspace_cpu = res.alloc_cpu(8 * sizeof(float));
+    void *workspace_gpu = res.alloc_gpu(8 * sizeof(float));
+
+    cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
+    int64_t N = inputs[0].size();
+    int block = 256;
+    int grid = (N + (block - 1)) / block;
+    relu_gpu_forward<<<grid,block,0,cuda_stream>>>(out_data, in_data, N);
+
+    return MX_SUCCESS;
+}
+
+MXReturnValue backwardGPU(std::map<std::string, std::string> attrs,
+                          std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+
+    cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
 
 Review comment:
   should we just have `get_cuda_stream` return a `cudaStream_t` type? We can use `#ifdef __NVCC__` around the code to protect when the user doesnt compile with nvcc (see [reference here](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#nvcc-identification-macro)) and just use `void*` type in the `#else` block when the user just compiles with gcc.
   
   It will simplify the user's code from:
   ```
   cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
   ```
   to
   ```
   cudaStream_t cuda_stream = res.get_cuda_stream();
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369384340
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -328,16 +358,31 @@ typedef void* (*xpu_malloc_t)(void*, int);
  */
 class OpResource {
  public:
-  OpResource(xpu_malloc_t cm, void* ca) : cpu_malloc(cm), cpu_alloc(ca) {}
+  OpResource(xpu_malloc_t cm, void* ca, xpu_malloc_t gm, void* ga, void* st)
+    : cpu_malloc(cm), gpu_malloc(gm), cpu_alloc(ca), gpu_alloc(ga), cuda_stream(st) {}
 
   /*! \brief allocate memory controlled by MXNet */
-  void* alloc(int size) {
+  void* alloc_cpu(int size) {
     return cpu_malloc(cpu_alloc, size);
   }
 
+  /*! \brief allocate memory controlled by MXNet */
+  void* alloc_gpu(int size) {
+    return gpu_malloc(gpu_alloc, size);
+  }
+
+  /*! \brief return the gpu stream object */
+  void* get_cuda_stream() {
 
 Review comment:
   As mentioned above, consider doing this instead:
   ```
   #if defined(__NVCC__)
    cudaStream_t
   #else
    void*
   #endif
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372611985
 
 

 ##########
 File path: Makefile
 ##########
 @@ -664,11 +664,19 @@ cpplint:
 pylint:
 	python3 -m pylint --rcfile=$(ROOTDIR)/ci/other/pylintrc --ignore-patterns=".*\.so$$,.*\.dll$$,.*\.dylib$$" python/mxnet
 
-# sample lib for MXNet extension dynamically loading custom operator
-sample_lib:
-	$(CXX) -shared -fPIC -std=c++11 example/extensions/lib_custom_op/gemm_lib.cc -o libsample_lib.so -I include/mxnet
+# MXNet extension dynamically loading libraries
+EXT_LIBS = custom_op_lib subgraph_lib
+ifeq ($(USE_CUDA), 1)
+	EXT_LIBS += custom_op_gpu_lib
+endif
+extension_libs: $(EXT_LIBS)
+
+custom_op_lib:
+	$(CXX) -shared -fPIC -std=c++11 example/extensions/lib_custom_op/gemm_lib.cc -o build/libcustomop_lib.so -I include/mxnet
+custom_op_gpu_lib:
+	$(NVCC) -shared -std=c++11 -Xcompiler -fPIC example/extensions/lib_custom_op/relu_lib.cu -o build/libcustomop_gpu_lib.so -I include/mxnet
 
 Review comment:
   sure, makes sense.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372699354
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -203,6 +214,16 @@ enum MXDType {
   kUNSET = 100,
 };
 
+/*!
+ * \brief Context info passing from MXNet OpContext
+ * dev_type is string repr of supported context, currently only "cpu" and "gpu"
+ * dev_id is the device index where the tensor locates
+ */
+typedef struct {
+  std::string dev_type;
 
 Review comment:
   no, this string only serves as an easy way for custom library to check the device type, the actual context names passed from custom library to MXNet are still in const char *. Line 971 in lib_api.h is what we passed through the boundary

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r371920345
 
 

 ##########
 File path: CMakeLists.txt
 ##########
 @@ -752,6 +741,32 @@ elseif(MSVC)
 
 endif()
 
+add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc)
+add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc)
+target_include_directories(customop_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
+target_include_directories(subgraph_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
+if (USE_CUDA)
+  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
+  target_include_directories(customop_gpu_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
+endif()
+if(UNIX)
+  target_compile_options(customop_lib PUBLIC -shared)
+  target_compile_options(subgraph_lib PUBLIC -shared)
+  if (USE_CUDA)
+    target_compile_options(customop_gpu_lib PUBLIC -shared)
+  endif()
+elseif(MSVC)
+  target_compile_options(customop_lib PUBLIC /LD)
+  target_compile_options(subgraph_lib PUBLIC /LD)
+  set_target_properties(customop_lib PROPERTIES PREFIX "lib")
+  set_target_properties(subgraph_lib PROPERTIES PREFIX "lib")
+  if (USE_CUDA)
+    set(CMAKE_VERBOSE_MAKEFILE ON)
 
 Review comment:
   can we remove this now that windows builds are working?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372699518
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -203,6 +214,16 @@ enum MXDType {
   kUNSET = 100,
 };
 
+/*!
+ * \brief Context info passing from MXNet OpContext
+ * dev_type is string repr of supported context, currently only "cpu" and "gpu"
+ * dev_id is the device index where the tensor locates
+ */
+typedef struct {
+  std::string dev_type;
 
 Review comment:
   I see. Thank you!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372696099
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -203,6 +214,16 @@ enum MXDType {
   kUNSET = 100,
 };
 
+/*!
+ * \brief Context info passing from MXNet OpContext
+ * dev_type is string repr of supported context, currently only "cpu" and "gpu"
+ * dev_id is the device index where the tensor locates
+ */
+typedef struct {
+  std::string dev_type;
 
 Review comment:
   ~I worry about the ABI compatibility of std::string, since MXNet and custom lib may be built in different version of compilers or different compilers, e.g. gcc5 and gcc 9, gcc and clang. The implementation of std::string may be different. A solution is to use a C string char*.~
   I see.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-576973649
 
 
   @samskalicky @eric-haibin-lin appreciate another review. now custom library can allocate both gpu and cpu memory, and all dispatch logics are in mxnet

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372641060
 
 

 ##########
 File path: tests/python/unittest/test_extensions.py
 ##########
 @@ -148,3 +156,47 @@ def test_subgraph():
     out3 = exe3.forward()
     # check that result matches one executed by MXNet
     assert_almost_equal(out[0].asnumpy(), out3[0].asnumpy(), rtol=1e-3, atol=1e-3)
+
+@unittest.skipIf(check_platform(), "not all machine types supported")
+@unittest.skipIf(is_cd_run(), "continuous delivery run - ignoring test")
 
 Review comment:
   ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r370909234
 
 

 ##########
 File path: example/extensions/lib_custom_op/relu_lib.cu
 ##########
 @@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * Copyright (c) 2020 by Contributors
+ * \file relu_lib.cu
+ * \brief simple custom relu operator implemented using CUDA function
+ */
+
+#include <iostream>
+#include "lib_api.h"
+
+__global__ void relu_gpu_forward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? in[tid] : 0;
+}
+
+__global__ void relu_gpu_backward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? 1 : 0;
 
 Review comment:
   Random thing, but this is actually wrong, right? It should not be 1, but 1*incoming gradient, right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] eric-haibin-lin commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
eric-haibin-lin commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r366082428
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -720,8 +751,11 @@ int MXLoadLib(const char *path) {
         gradOp.set_attr<bool>("TIsLayerOpBackward", true, plevel);
         gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<cpu>",
                                             fstateful_backward, plevel);
+        gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<gpu>",
 
 Review comment:
   What is the target supported contexts for this feature? Do we target just cpu and gpu, or we want to support other hardware backends, too?
   
   Currently the dispatch logic is inside FCompute, which is a bit different from existing mxnet users' experience. Usually the FCompute only declares the computation, and leave the dispatch logic to MXNet executor. And it's unclear how it supports the case where the same op is extended by a library for Intel CPUs and NVIDIA GPUs - they may hard-code the dispatch logic to only care about their own hardware. How do we handle such conflicts? 
   
   Furthermore, currently the infer_shape/infer_dtype is not context-aware, i.e. CPU and GPU infers the same dtype. However, it may not be true (e.g. cpu supports fp32 and bfloat16, and gpu supports fp32 and fp16). How do we handle these attribute conflict? 
   
   I had a short discussion with @yzhliu and we saw two potential fixes:
   1. make infer_shape/infer_dtype context aware. This way we can have different infer_dtype function for cpu & gpu. MXNet needs to dispatch to the function based on the current context. For example, `op.set_attr<FInferType>("FInferType<cpu>", my_infer_type_function)` for cpu specific type inference, and `op.set_attr<FInferType>("FInferType<gpu>", my_infer_type_function_gpu)`for gpu. 
   2. Another way is to register ops with different names (e.g. 'cpu_gemm' and 'gpu_gemm'). This way they can have different infer_attr functions. But we don't want users to modify their model definition in the training script to these names. To mitigate that we can have an API to allow user to provide a mapping (e.g. {'gemm' -> 'cpu_gemm'}) for mxnet to map an op to another op registered in the backend. 
   
   Finally, is there a plan to support dynamic custom context? :P @samskalicky 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369386321
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -1056,26 +1171,30 @@ extern "C" {
   int
 #endif
   _opCallFStatefulCompute(int is_forward, void* state_op,
-                          const int64_t** inshapes, int* indims,
-                          void** indata, int* intypes, size_t* inIDs, int num_in,
-                          const int64_t** outshapes, int* outdims,
-                          void** outdata, int* outtypes, size_t* outIDs, int num_out,
-                          xpu_malloc_t cpu_malloc, void* cpu_alloc) {
+                          const int64_t** inshapes, int* indims, void** indata, int* intypes,
+                          size_t* inIDs, const char** indev_type, int* indev_id, int num_in,
+                          const int64_t** outshapes, int* outdims, void** outdata, int* outtypes,
+                          size_t* outIDs, const char** outdev_type, int* outdev_id, int num_out,
+                          xpu_malloc_t cpu_malloc, void* cpu_alloc,
+                          xpu_malloc_t gpu_malloc, void* gpu_alloc, void* stream) {
     // create a vector of tensors for inputs
     std::vector<MXTensor> inputs(num_in);
     for (int i = 0; i < num_in; i++) {
-      inputs[i].setTensor(indata[i], (MXDType)intypes[i], inshapes[i], indims[i], inIDs[i]);
+      inputs[i].setTensor(indata[i], (MXDType)intypes[i], inshapes[i], indims[i],
+                          inIDs[i], {indev_type[i], indev_id[i]});
     }
 
     // create a vector of tensors for outputs
     std::vector<MXTensor> outputs(num_out);
     for (int i = 0; i < num_out; i++) {
       outputs[i].setTensor(outdata[i], (MXDType)outtypes[i], outshapes[i], outdims[i],
-                           outIDs[i]);
+                           outIDs[i], {outdev_type[i], outdev_id[i]});
     }
-    OpResource res(cpu_malloc, cpu_alloc);
+
+    OpResource res(cpu_malloc, cpu_alloc, gpu_malloc, gpu_alloc, stream);
+
     CustomStatefulOp* op_ptr = reinterpret_cast<CustomStatefulOp*>(state_op);
-    if (is_forward) {
+    if (is_forward == 1) {
 
 Review comment:
   the equality comparison is not necessary since a you're passing either a 1 or a 0 in c_api.cc and `if(0)` evaluates to false

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367701262
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -450,32 +480,59 @@ int MXLoadLib(const char *path) {
         return ptr;
       };
 
+      // pass the gpu stream associated with the context to custom library
+      void* gpu_stream = nullptr;
+      if (inputs[i].ctx().dev_mask() == Context::kGPU) {
+        mshadow::Stream<mxnet::gpu> *s = ctx.get_stream<mxnet::gpu>();
+        gpu_stream = static_cast<void*>(mshadow::Stream<gpu>::GetStream(s));
+      }
+
       // call fcompute function
       CHECK(callFComp(fcomp_fp, attr_keys.data(), attr_vals.data(), attr_keys.size(),
-                      in_shapes.data(), in_dims.data(), in_data.data(),
-                      in_types.data(), in_verIDs.data(), in_data.size(),
-                      out_shapes.data(), out_dims.data(), out_data.data(),
-                      out_types.data(), out_verIDs.data(), out_data.size(),
-                      cpu_malloc, &cpu_alloc))
+                      in_shapes.data(), in_dims.data(), in_data.data(), in_types.data(),
+                      in_verIDs.data(), in_dev_type.data(), in_dev_id.data(), in_data.size(),
+                      out_shapes.data(), out_dims.data(), out_data.data(), out_types.data(),
+                      out_verIDs.data(), out_dev_type.data(), out_dev_id.data(), out_data.size(),
+                      cpu_malloc, &cpu_alloc, gpu_stream))
       << "Error calling FCompute for custom operator '" << name_str << "'";
 
       // return type void
     };
 
-    auto forward_lambda = [=](const nnvm::NodeAttrs& attrs,
-                              const OpContext& ctx,
-                              const std::vector<NDArray>& inputs,
-                              const std::vector<OpReqType>& req,
-                              const std::vector<NDArray>& outputs) {
-      return fcomp_lambda(fcomp_fp, attrs, ctx, inputs, req, outputs);
+    auto forward_cpu_lambda = [=](const nnvm::NodeAttrs& attrs,
+                                  const OpContext& ctx,
+                                  const std::vector<NDArray>& inputs,
+                                  const std::vector<OpReqType>& req,
+                                  const std::vector<NDArray>& outputs) {
+      CHECK(forward_ctx_map.count("cpu") > 0) << "CPU Forward function is not implemented";
+      return fcomp_lambda(forward_ctx_map.at("cpu"), attrs, ctx, inputs, req, outputs);
 
 Review comment:
   so this implictly copies the entire forward_ctx_map object into the lambda. do we want to do this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r371045994
 
 

 ##########
 File path: example/extensions/lib_custom_op/test_relu.py
 ##########
 @@ -0,0 +1,69 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=arguments-differ
+
+# This test checks dynamic loading of custom library into MXNet
+# and checks end to end compute of a simple 2D gemm custom op
+
+import mxnet as mx
+import os
+import time
+
+#load library
+if (os.name=='posix'):
+    path = os.path.abspath('librelu_lib.so')
+    mx.library.load(path)
+
+a = mx.nd.array([[-2,-1],[1,2]], ctx=mx.cpu())
+b = mx.nd.array([[-2,-1],[1,2]], ctx=mx.gpu())
+
+print("--------start ndarray compute---------")
+print(mx.nd.my_relu(a))
+print(mx.nd.my_relu(b))
+print(mx.nd.my_state_relu(a))
+print(mx.nd.my_state_relu(b))
+
+print("--------start symbolic compute--------")
+c = mx.sym.Variable('c')
+d = mx.sym.my_relu(c)
+in_grad = [mx.nd.empty((2,2), ctx=mx.gpu())]
+exe = d.bind(ctx=mx.gpu(), args={'c':b}, args_grad=in_grad)
+out = exe.forward()
+print(out)
+
+print("--------start backward compute--------")
+out_grad = mx.nd.ones((2,2), ctx=mx.gpu())
+exe.backward([out_grad])
+print(in_grad)
+
+print("--------start stress test---------")
+a = mx.nd.uniform(shape=(1000,1000,100), ctx=mx.cpu())
+b = mx.nd.uniform(shape=(1000,1000,100), ctx=mx.gpu())
+t1 = time.time()
+r1 = mx.nd.my_relu(a)
+t2 = time.time()
+r2 = mx.nd.my_relu(b)
+t3 = time.time()
+print("CPU running time:")
+print(t2 - t1)
 
 Review comment:
   thanks for the pointing out the error. I didn't pay attention as we would do more tests including benchmark tests in the coming PRs. I have made the fix now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369335343
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -563,11 +587,21 @@ int MXLoadLib(const char *path) {
       }
 
       // create a pointer to hold custom op state object
+      // only create one stateful op depending on passing context
+      // user can add new supported context and call to custom library
       void* state_op_inst = nullptr;
-      CHECK(callCreateOpState(create_opstate_fp, attr_keys.data(), attr_vals.data(),
-                              attr_keys.size(), &state_op_inst))
-      << "Error calling CreateOpState for custom operator '" << name_str << "'";
-
+      if (ctx.dev_mask() == Context::kCPU) {
+        CHECK(createop_map.count("cpu") > 0) << "CPU CreateOpState not implemented";
 
 Review comment:
   agree

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372603217
 
 

 ##########
 File path: Makefile
 ##########
 @@ -664,11 +664,19 @@ cpplint:
 pylint:
 	python3 -m pylint --rcfile=$(ROOTDIR)/ci/other/pylintrc --ignore-patterns=".*\.so$$,.*\.dll$$,.*\.dylib$$" python/mxnet
 
-# sample lib for MXNet extension dynamically loading custom operator
-sample_lib:
-	$(CXX) -shared -fPIC -std=c++11 example/extensions/lib_custom_op/gemm_lib.cc -o libsample_lib.so -I include/mxnet
+# MXNet extension dynamically loading libraries
+EXT_LIBS = custom_op_lib subgraph_lib
+ifeq ($(USE_CUDA), 1)
+	EXT_LIBS += custom_op_gpu_lib
+endif
+extension_libs: $(EXT_LIBS)
+
+custom_op_lib:
+	$(CXX) -shared -fPIC -std=c++11 example/extensions/lib_custom_op/gemm_lib.cc -o build/libcustomop_lib.so -I include/mxnet
+custom_op_gpu_lib:
+	$(NVCC) -shared -std=c++11 -Xcompiler -fPIC example/extensions/lib_custom_op/relu_lib.cu -o build/libcustomop_gpu_lib.so -I include/mxnet
 
 Review comment:
   we dont need to use the same MXNet flags to build the library, its totally separate

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372604166
 
 

 ##########
 File path: Makefile
 ##########
 @@ -664,11 +664,19 @@ cpplint:
 pylint:
 	python3 -m pylint --rcfile=$(ROOTDIR)/ci/other/pylintrc --ignore-patterns=".*\.so$$,.*\.dll$$,.*\.dylib$$" python/mxnet
 
-# sample lib for MXNet extension dynamically loading custom operator
-sample_lib:
-	$(CXX) -shared -fPIC -std=c++11 example/extensions/lib_custom_op/gemm_lib.cc -o libsample_lib.so -I include/mxnet
+# MXNet extension dynamically loading libraries
+EXT_LIBS = custom_op_lib subgraph_lib
+ifeq ($(USE_CUDA), 1)
+	EXT_LIBS += custom_op_gpu_lib
+endif
+extension_libs: $(EXT_LIBS)
+
+custom_op_lib:
+	$(CXX) -shared -fPIC -std=c++11 example/extensions/lib_custom_op/gemm_lib.cc -o build/libcustomop_lib.so -I include/mxnet
+custom_op_gpu_lib:
+	$(NVCC) -shared -std=c++11 -Xcompiler -fPIC example/extensions/lib_custom_op/relu_lib.cu -o build/libcustomop_gpu_lib.so -I include/mxnet
 
 Review comment:
   it is a very small library simply for illustration purpose, so it doesn't necessarily use the existing NVCC and CUDA_ARCH flags used for compile MXNet

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372603452
 
 

 ##########
 File path: example/extensions/lib_custom_op/test_relu.py
 ##########
 @@ -0,0 +1,83 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=arguments-differ
+
+# This test checks dynamic loading of custom library into MXNet
+# and checks end to end compute of a simple 2D gemm custom op
+
+import mxnet as mx
+import os
+import time
+
+#load library
+if (os.name=='posix'):
+    path = os.path.abspath('librelu_lib.so')
+    mx.library.load(path)
+
+a = mx.nd.array([[-2,-1],[1,2]], ctx=mx.cpu())
+b = mx.nd.array([[-2,-1],[1,2]], ctx=mx.gpu())
+
+print("--------start ndarray compute---------")
+print(mx.nd.my_relu(a))
+print(mx.nd.my_relu(b))
+print(mx.nd.my_state_relu(a))
+print(mx.nd.my_state_relu(b))
+
+print("--------start symbolic compute--------")
+c = mx.sym.Variable('c')
+d = mx.sym.Variable('d')
+e = mx.sym.my_relu(c)
+base = mx.sym.relu(d)
+in_grad = [mx.nd.empty((2,2), ctx=mx.gpu())]
+in_grad_base = [mx.nd.empty((2,2), ctx=mx.gpu())]
+exe = e.bind(ctx=mx.gpu(), args={'c':b}, args_grad=in_grad)
+exe_base = base.bind(ctx=mx.gpu(), args={'d':b}, args_grad=in_grad_base)
+out = exe.forward()
+out_base = exe_base.forward()
+print(out)
+print(out_base)
+
+print("--------start backward compute--------")
+out_grad = mx.nd.ones((2,2), ctx=mx.gpu())
+exe.backward([out_grad])
+exe_base.backward([out_grad])
+print(in_grad)
+print(in_grad_base)
+
+print("--------start testing larger ndarray---------")
+a = mx.nd.uniform(shape=(100,100,100), ctx=mx.cpu())
+b = mx.nd.uniform(shape=(100,100,100), ctx=mx.gpu())
 
 Review comment:
   There should be `mx.nd.waitall` after this line for the timings to be any good.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-580557852
 
 
   @wkcn Thanks! Can you help me to merge it?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367251499
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -720,8 +751,11 @@ int MXLoadLib(const char *path) {
         gradOp.set_attr<bool>("TIsLayerOpBackward", true, plevel);
         gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<cpu>",
                                             fstateful_backward, plevel);
+        gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<gpu>",
 
 Review comment:
   I think that I support approach 1. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372700554
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -203,6 +214,16 @@ enum MXDType {
   kUNSET = 100,
 };
 
+/*!
+ * \brief Context info passing from MXNet OpContext
+ * dev_type is string repr of supported context, currently only "cpu" and "gpu"
+ * dev_id is the device index where the tensor locates
+ */
+typedef struct {
+  std::string dev_type;
 
 Review comment:
   I will appreciate if you can make another review!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367730749
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -97,9 +97,31 @@ inline int MXAPIGetFunctionRegInfo(const FunRegType *e,
 }
 
 // NOTE: return value is added in API_END
+/*!
+ * \brief Convert NDArray to C type arrays for passing to custom library
+ */
+void NDArrayToCTypeArray(const std::vector<NDArray>& inputs,
 
 Review comment:
   agree

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369383120
 
 

 ##########
 File path: example/extensions/lib_custom_op/relu_lib.cu
 ##########
 @@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * Copyright (c) 2020 by Contributors
+ * \file relu_lib.cu
+ * \brief simple custom relu operator implemented using CUDA function
+ */
+
+#include <iostream>
+#include "lib_api.h"
+
+__global__ void relu_gpu_forward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? in[tid] : 0;
+}
+
+__global__ void relu_gpu_backward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? 1 : 0;
+}
+
+MXReturnValue forwardCPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? in_data[i] : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue backwardCPU(std::map<std::string, std::string> attrs,
+                          std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? 1 : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue forwardGPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+
+    // test on memory resource allocation
+    void *workspace_cpu = res.alloc_cpu(8 * sizeof(float));
+    void *workspace_gpu = res.alloc_gpu(8 * sizeof(float));
+
+    cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
+    int64_t N = inputs[0].size();
+    int block = 256;
+    int grid = (N + (block - 1)) / block;
+    relu_gpu_forward<<<grid,block,0,cuda_stream>>>(out_data, in_data, N);
+
+    return MX_SUCCESS;
+}
+
+MXReturnValue backwardGPU(std::map<std::string, std::string> attrs,
+                          std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+
+    cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
+    int64_t N = inputs[0].size();
+    int block = 256;
+    int grid = (N + (block - 1)) / block;
+    relu_gpu_backward<<<grid,block,0,cuda_stream>>>(out_data, in_data, N);
+
+    return MX_SUCCESS;
+}
+
+MXReturnValue parseAttrs(std::map<std::string, std::string> attrs, int* num_in, int* num_out) {
+    *num_in = 1;
+    *num_out = 1;
+    return MX_SUCCESS;
+}
+
+MXReturnValue inferType(std::map<std::string, std::string> attrs,
+                        std::vector<int> &intypes,
+                        std::vector<int> &outtypes) {
+    outtypes[0] = intypes[0];
+    return MX_SUCCESS;
+}
+
+MXReturnValue inferShape(std::map<std::string, std::string> attrs,
+                         std::vector<std::vector<unsigned int>> &inshapes,
+                         std::vector<std::vector<unsigned int>> &outshapes) {
+    outshapes[0] = inshapes[0];
+    return MX_SUCCESS;
+}
+
+REGISTER_OP(my_relu)
+.setParseAttrs(parseAttrs)
+.setInferType(inferType)
+.setInferShape(inferShape)
+.setForward(forwardCPU, "cpu")
+.setForward(forwardGPU, "gpu")
+.setBackward(backwardCPU, "cpu")
+.setBackward(backwardGPU, "gpu");
+
+
+
+class MyStatefulReluCPU : public CustomStatefulOp {
+public:
+    explicit MyStatefulReluCPU() {}
+    MXReturnValue Forward(std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource op_res) {
+        std::map<std::string, std::string> attrs;
+        return forwardCPU(attrs, inputs, outputs, op_res);
+    }
+    MXReturnValue Backward(std::vector<MXTensor> inputs,
+                           std::vector<MXTensor> outputs,
+                           OpResource op_res) {
+        std::map<std::string, std::string> attrs;
+        return backwardCPU(attrs, inputs, outputs, op_res);
+    }
+    ~MyStatefulReluCPU() {}
+};
+
+class MyStatefulReluGPU : public CustomStatefulOp {
+public:
+    explicit MyStatefulReluGPU() {}
+    MXReturnValue Forward(std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource op_res) {
+        std::map<std::string, std::string> attrs;
+        return forwardGPU(attrs, inputs, outputs, op_res);
+    }
+    MXReturnValue Backward(std::vector<MXTensor> inputs,
+                           std::vector<MXTensor> outputs,
+                           OpResource op_res) {
+        std::map<std::string, std::string> attrs;
+        return backwardGPU(attrs, inputs, outputs, op_res);
+    }
+    ~MyStatefulReluGPU() {}
+};
+
+MXReturnValue createOpStateCPU(std::map<std::string, std::string> attrs,
+    CustomStatefulOp** op_inst) {
+*op_inst = new MyStatefulReluCPU();
+return MX_SUCCESS;
+}
+
+MXReturnValue createOpStateGPU(std::map<std::string, std::string> attrs,
+                               CustomStatefulOp** op_inst) {
+    *op_inst = new MyStatefulReluGPU();
+    return MX_SUCCESS;
+}
+
+REGISTER_OP(my_state_relu)
+.setParseAttrs(parseAttrs)
+.setInferType(inferType)
+.setInferShape(inferShape)
+.setCreateOpState(createOpStateCPU, "cpu")
+.setCreateOpState(createOpStateGPU, "gpu");
+
+MXReturnValue initialize(int version) {
+    if (version >= 10400) {
 
 Review comment:
   can we change the version number in the examples to 10600 since thats the current version in the master branch? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367251787
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -720,8 +751,11 @@ int MXLoadLib(const char *path) {
         gradOp.set_attr<bool>("TIsLayerOpBackward", true, plevel);
         gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<cpu>",
                                             fstateful_backward, plevel);
+        gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<gpu>",
 
 Review comment:
   I prefer approach 1, lets not do name mangling :D

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367703387
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -685,28 +757,26 @@ int MXLoadLib(const char *path) {
       using namespace mxnet::op;
       regOp.set_num_inputs(DefaultSubgraphOpNumInputs);
       regOp.set_num_outputs(DefaultSubgraphOpNumOutputs);
-      regOp.set_attr<nnvm::FInferType>("FInferType",
-                                       DefaultSubgraphOpType, plevel);
-      regOp.set_attr<mxnet::FInferShape>("FInferShape",
-                                         DefaultSubgraphOpShape, plevel);
-      regOp.set_attr<FInferStorageType>("FInferStorageType",
-                                        DefaultSubgraphOpStorageType, plevel);
-      regOp.set_attr<FResourceRequest>("FResourceRequest",
-                                       DefaultSubgraphOpResourceRequest, plevel);
-      regOp.set_attr<nnvm::FMutateInputs>("FMutateInputs",
-                                          DefaultSubgraphOpMutableInputs, plevel);
+      regOp.set_attr<nnvm::FInferType>("FInferType", DefaultSubgraphOpType, plevel);
+      regOp.set_attr<mxnet::FInferShape>("FInferShape", DefaultSubgraphOpShape, plevel);
+      regOp.set_attr<FInferStorageType>("FInferStorageType", DefaultSubgraphOpStorageType, plevel);
+      regOp.set_attr<FResourceRequest>("FResourceRequest", DefaultSubgraphOpResourceRequest, plevel);
+      regOp.set_attr<nnvm::FMutateInputs>("FMutateInputs", DefaultSubgraphOpMutableInputs, plevel);
     }
 
     // optionally add stateful forward
     if (create_opstate_fp != nullptr) {
       regOp.set_attr<FCreateOpState>("FCreateOpState", create_opstate, plevel);
       regOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<cpu>",
                                         fstateful_forward, plevel);
+      regOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<gpu>",
 
 Review comment:
   shouldnt we only register this conditionally?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367731357
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -833,20 +891,26 @@ extern "C" {
 #else
   void
 #endif
-  _opRegGet(int idx, const char** name, fcomp_t* fcomp, fcomp_t* fgrad,
+  _opRegGet(int idx, const char** name,
+            const char*** forward_ctx, fcomp_t** forward_fp, int* forward_count,
+            const char*** backward_ctx, fcomp_t** backward_fp, int* backward_count,
             parseAttrs_t* parse, inferType_t* type,
             inferShape_t* shape, mutateInputs_t* mutate,
             createOpState_t* create_op, int *isSGop) {
-    CustomOp op = Registry<CustomOp>::get()->get(idx);
-    *name = op.name;
-    *fcomp = op.forward;
-    *fgrad = op.backward;
-    *parse = op.parse_attrs;
-    *type = op.infer_type;
-    *shape = op.infer_shape;
-    *mutate = op.mutate_inputs;
-    *create_op = op.create_opstate;
-    *isSGop = op.isSGop;
+    CustomOp *op = &(Registry<CustomOp>::get()->get(idx));
 
 Review comment:
   CustomOp& op works perfectly. since MXNet registry returns a reference variable, let's just use this solution

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r373161746
 
 

 ##########
 File path: example/extensions/lib_custom_op/test_relu.py
 ##########
 @@ -0,0 +1,83 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=arguments-differ
+
+# This test checks dynamic loading of custom library into MXNet
+# and checks end to end compute of a simple 2D gemm custom op
+
+import mxnet as mx
+import os
+import time
+
+#load library
+if (os.name=='posix'):
+    path = os.path.abspath('librelu_lib.so')
+    mx.library.load(path)
+
+a = mx.nd.array([[-2,-1],[1,2]], ctx=mx.cpu())
+b = mx.nd.array([[-2,-1],[1,2]], ctx=mx.gpu())
+
+print("--------start ndarray compute---------")
+print(mx.nd.my_relu(a))
+print(mx.nd.my_relu(b))
+print(mx.nd.my_state_relu(a))
+print(mx.nd.my_state_relu(b))
+
+print("--------start symbolic compute--------")
+c = mx.sym.Variable('c')
+d = mx.sym.Variable('d')
+e = mx.sym.my_relu(c)
+base = mx.sym.relu(d)
+in_grad = [mx.nd.empty((2,2), ctx=mx.gpu())]
+in_grad_base = [mx.nd.empty((2,2), ctx=mx.gpu())]
+exe = e.bind(ctx=mx.gpu(), args={'c':b}, args_grad=in_grad)
+exe_base = base.bind(ctx=mx.gpu(), args={'d':b}, args_grad=in_grad_base)
+out = exe.forward()
+out_base = exe_base.forward()
+print(out)
+print(out_base)
+
+print("--------start backward compute--------")
+out_grad = mx.nd.ones((2,2), ctx=mx.gpu())
+exe.backward([out_grad])
+exe_base.backward([out_grad])
+print(in_grad)
+print(in_grad_base)
+
+print("--------start testing larger ndarray---------")
+a = mx.nd.uniform(shape=(100,100,100), ctx=mx.cpu())
+b = mx.nd.uniform(shape=(100,100,100), ctx=mx.gpu())
 
 Review comment:
   I have made a PR on CustomOp GPU support doc #17486 and I will made this fix on the example in that PR as soon as this PR gets merged.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-578446097
 
 
   > One question I have is how would you use other resources than temp workspace in the custom op, like a random resource?
   
   We're tracking all the other operator features to add for custom ops here: #17006 and RNGs are on the list. Please add more items there and we'll get to it in the next PR.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369382808
 
 

 ##########
 File path: example/extensions/lib_custom_op/relu_lib.cu
 ##########
 @@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * Copyright (c) 2020 by Contributors
+ * \file relu_lib.cu
+ * \brief simple custom relu operator implemented using CUDA function
+ */
+
+#include <iostream>
+#include "lib_api.h"
+
+__global__ void relu_gpu_forward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? in[tid] : 0;
+}
+
+__global__ void relu_gpu_backward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? 1 : 0;
+}
+
+MXReturnValue forwardCPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? in_data[i] : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue backwardCPU(std::map<std::string, std::string> attrs,
+                          std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? 1 : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue forwardGPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+
+    // test on memory resource allocation
+    void *workspace_cpu = res.alloc_cpu(8 * sizeof(float));
+    void *workspace_gpu = res.alloc_gpu(8 * sizeof(float));
+
+    cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
+    int64_t N = inputs[0].size();
+    int block = 256;
+    int grid = (N + (block - 1)) / block;
+    relu_gpu_forward<<<grid,block,0,cuda_stream>>>(out_data, in_data, N);
+
+    return MX_SUCCESS;
+}
+
+MXReturnValue backwardGPU(std::map<std::string, std::string> attrs,
+                          std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+
+    cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
+    int64_t N = inputs[0].size();
+    int block = 256;
+    int grid = (N + (block - 1)) / block;
+    relu_gpu_backward<<<grid,block,0,cuda_stream>>>(out_data, in_data, N);
+
+    return MX_SUCCESS;
+}
+
+MXReturnValue parseAttrs(std::map<std::string, std::string> attrs, int* num_in, int* num_out) {
+    *num_in = 1;
+    *num_out = 1;
+    return MX_SUCCESS;
+}
+
+MXReturnValue inferType(std::map<std::string, std::string> attrs,
+                        std::vector<int> &intypes,
+                        std::vector<int> &outtypes) {
+    outtypes[0] = intypes[0];
+    return MX_SUCCESS;
+}
+
+MXReturnValue inferShape(std::map<std::string, std::string> attrs,
+                         std::vector<std::vector<unsigned int>> &inshapes,
+                         std::vector<std::vector<unsigned int>> &outshapes) {
+    outshapes[0] = inshapes[0];
+    return MX_SUCCESS;
+}
+
+REGISTER_OP(my_relu)
+.setParseAttrs(parseAttrs)
+.setInferType(inferType)
+.setInferShape(inferShape)
+.setForward(forwardCPU, "cpu")
+.setForward(forwardGPU, "gpu")
+.setBackward(backwardCPU, "cpu")
+.setBackward(backwardGPU, "gpu");
+
+
+
 
 Review comment:
   can we just reduce this down to 1 extra empty line?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r371613185
 
 

 ##########
 File path: CMakeLists.txt
 ##########
 @@ -752,6 +741,31 @@ elseif(MSVC)
 
 endif()
 
+add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc)
+add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc)
+target_include_directories(customop_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
+target_include_directories(subgraph_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
+if (USE_CUDA)
+  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
+  target_include_directories(customop_gpu_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
+endif()
+if(UNIX)
+  target_compile_options(customop_lib PUBLIC -shared)
+  target_compile_options(subgraph_lib PUBLIC -shared)
+  if (USE_CUDA)
+    target_compile_options(customop_gpu_lib PUBLIC -shared)
+  endif()
+elseif(MSVC)
+  target_compile_options(customop_lib PUBLIC /LD)
+  target_compile_options(subgraph_lib PUBLIC /LD)
+  set_target_properties(customop_lib PROPERTIES PREFIX "lib")
+  set_target_properties(subgraph_lib PROPERTIES PREFIX "lib")
+  if (USE_CUDA)
+    target_compile_options(customop_gpu_lib PUBLIC /LD)
 
 Review comment:
   can you try adding:
   ```
   target_compile_options(target_compile_options  PUBLIC "$<COMPILE_LANGUAGE:CUDA>>:-Xcompiler=-fPIC>")
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369382573
 
 

 ##########
 File path: example/extensions/lib_custom_op/relu_lib.cu
 ##########
 @@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * Copyright (c) 2020 by Contributors
+ * \file relu_lib.cu
+ * \brief simple custom relu operator implemented using CUDA function
+ */
+
+#include <iostream>
+#include "lib_api.h"
+
+__global__ void relu_gpu_forward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? in[tid] : 0;
+}
+
+__global__ void relu_gpu_backward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? 1 : 0;
+}
+
+MXReturnValue forwardCPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? in_data[i] : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue backwardCPU(std::map<std::string, std::string> attrs,
+                          std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? 1 : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue forwardGPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+
+    // test on memory resource allocation
+    void *workspace_cpu = res.alloc_cpu(8 * sizeof(float));
+    void *workspace_gpu = res.alloc_gpu(8 * sizeof(float));
+
+    cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
+    int64_t N = inputs[0].size();
+    int block = 256;
+    int grid = (N + (block - 1)) / block;
+    relu_gpu_forward<<<grid,block,0,cuda_stream>>>(out_data, in_data, N);
+
+    return MX_SUCCESS;
+}
+
+MXReturnValue backwardGPU(std::map<std::string, std::string> attrs,
+                          std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+
+    cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
 
 Review comment:
   should we just have `get_cuda_stream` return a cudaStream_t type? We can use `#ifdef __NVCC__` around the code to protect when the user doesnt compile with nvcc (see [reference here](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#nvcc-identification-macro)) and just use `void*` type in the `#else` block.
   
   It will simplify the user's code from:
   ```
   cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
   ```
   to
   ```
   cudaStream_t cuda_stream = res.get_cuda_stream();
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367700150
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -982,17 +1045,22 @@ extern "C" {
     // create a vector of tensors for inputs
     std::vector<MXTensor> inputs(num_in);
     for (int i = 0; i < num_in; i++) {
-      inputs[i].setTensor(indata[i], (MXDType)intypes[i], inshapes[i], indims[i], inIDs[i]);
+      std::string ctx_str(indev_type[i]);
+      MXContext inctx = {ctx_str, indev_id[i]};
+      inputs[i].setTensor(indata[i], (MXDType)intypes[i], inshapes[i], indims[i],
 
 Review comment:
   string can be initialized with char* http://www.cplusplus.com/reference/string/string/string/ (see constructor 4)
   
   can we just inline everything and not create intermediary variables?
   ```
   inputs[i].setTensor(indata[i], (MXDType)intypes[i], inshapes[i], indims[i], inIDs[i], {ctx_str, indev_id[i]});
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367698887
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -602,12 +649,18 @@ class CustomOp {
     isSGop = true;
     return *this;
   }
+  ~CustomOp() {}
 
   /*! \brief operator name */
   const char* name;
+
+  /*! \brief each fcomp function is associated with a context speficied by a string*/
+  std::vector<const char*> forward_ctx_cstr;
+  std::vector<fcomp_t> forward_fp;
+  std::vector<const char*> backward_ctx_cstr;
 
 Review comment:
   should this be a map?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367730959
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -635,11 +698,20 @@ int MXLoadLib(const char *path) {
       CHECK(state_op_inst != nullptr)
       << "Error MXNet cannot load custom stateful operator'" << name_str << "'";
 
+      // pass the gpu stream associated with the context to custom library
+      void* gpu_stream = nullptr;
+      if (inputs[i].ctx().dev_mask() == Context::kGPU) {
 
 Review comment:
   yes, will fix it

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369384854
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -328,16 +358,31 @@ typedef void* (*xpu_malloc_t)(void*, int);
  */
 class OpResource {
  public:
-  OpResource(xpu_malloc_t cm, void* ca) : cpu_malloc(cm), cpu_alloc(ca) {}
+  OpResource(xpu_malloc_t cm, void* ca, xpu_malloc_t gm, void* ga, void* st)
+    : cpu_malloc(cm), gpu_malloc(gm), cpu_alloc(ca), gpu_alloc(ga), cuda_stream(st) {}
 
   /*! \brief allocate memory controlled by MXNet */
-  void* alloc(int size) {
+  void* alloc_cpu(int size) {
     return cpu_malloc(cpu_alloc, size);
   }
 
+  /*! \brief allocate memory controlled by MXNet */
+  void* alloc_gpu(int size) {
+    return gpu_malloc(gpu_alloc, size);
+  }
+
+  /*! \brief return the gpu stream object */
+  void* get_cuda_stream() {
+    return cuda_stream;
+  }
+
  private:
-  xpu_malloc_t cpu_malloc;
-  void* cpu_alloc;
+  /*! \brief wrapper to allocation lambda function */
+  xpu_malloc_t cpu_malloc, gpu_malloc;
+  /*! \brief lambda function to return allocated memory handle */
+  void *cpu_alloc, *gpu_alloc;
+  /*! \brief cuda stream passed from MXNet */
+  void *cuda_stream;
 
 Review comment:
   and again here, or do this once up above and use everywhere:
   ```
   #if defined(__NVCC__)
    typedef cudaStream_t mx_stream_t;
   #else
    typedef void* mx_stream_t;
   #endif
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369335323
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -408,24 +452,13 @@ int MXLoadLib(const char *path) {
       std::vector<int> in_dims, out_dims;
       std::vector<int> in_types, out_types;
       std::vector<size_t> in_verIDs, out_verIDs;
+      std::vector<const char*> in_dev_type, out_dev_type;
+      std::vector<int> in_dev_id, out_dev_id;
 
-      // convert input tensors to constituent parts
-      for (size_t i = 0; i < inputs.size(); i++) {
-        in_data.push_back(inputs[i].data().dptr_);
-        in_shapes.push_back(inputs[i].shape().data());
-        in_dims.push_back(inputs[i].shape().ndim());
-        in_types.push_back(inputs[i].dtype());
-        in_verIDs.push_back(inputs[i].version());
-      }
-
-      // convert output tensors to constituent parts
-      for (size_t i = 0; i < outputs.size(); i++) {
-        out_data.push_back(outputs[i].data().dptr_);
-        out_shapes.push_back(outputs[i].shape().data());
-        out_dims.push_back(outputs[i].shape().ndim());
-        out_types.push_back(outputs[i].dtype());
-        out_verIDs.push_back(outputs[i].version());
-      }
+      NDArrayToCTypes(inputs, in_data, in_shapes, in_dims, in_types,
+                      in_verIDs, in_dev_type, in_dev_id);
+      NDArrayToCTypes(outputs, out_data, out_shapes, out_dims, out_types,
+                      out_verIDs, out_dev_type, out_dev_id);
 
 Review comment:
   actually it doesn't save space, so I did a bigger change refactor lol

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367703437
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -685,28 +757,26 @@ int MXLoadLib(const char *path) {
       using namespace mxnet::op;
       regOp.set_num_inputs(DefaultSubgraphOpNumInputs);
       regOp.set_num_outputs(DefaultSubgraphOpNumOutputs);
-      regOp.set_attr<nnvm::FInferType>("FInferType",
-                                       DefaultSubgraphOpType, plevel);
-      regOp.set_attr<mxnet::FInferShape>("FInferShape",
-                                         DefaultSubgraphOpShape, plevel);
-      regOp.set_attr<FInferStorageType>("FInferStorageType",
-                                        DefaultSubgraphOpStorageType, plevel);
-      regOp.set_attr<FResourceRequest>("FResourceRequest",
-                                       DefaultSubgraphOpResourceRequest, plevel);
-      regOp.set_attr<nnvm::FMutateInputs>("FMutateInputs",
-                                          DefaultSubgraphOpMutableInputs, plevel);
+      regOp.set_attr<nnvm::FInferType>("FInferType", DefaultSubgraphOpType, plevel);
+      regOp.set_attr<mxnet::FInferShape>("FInferShape", DefaultSubgraphOpShape, plevel);
+      regOp.set_attr<FInferStorageType>("FInferStorageType", DefaultSubgraphOpStorageType, plevel);
+      regOp.set_attr<FResourceRequest>("FResourceRequest", DefaultSubgraphOpResourceRequest, plevel);
+      regOp.set_attr<nnvm::FMutateInputs>("FMutateInputs", DefaultSubgraphOpMutableInputs, plevel);
     }
 
     // optionally add stateful forward
     if (create_opstate_fp != nullptr) {
       regOp.set_attr<FCreateOpState>("FCreateOpState", create_opstate, plevel);
       regOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<cpu>",
                                         fstateful_forward, plevel);
+      regOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<gpu>",
+                                        fstateful_forward, plevel);
     } else {
-      regOp.set_attr<FComputeEx>("FComputeEx<cpu>", forward_lambda, plevel);
+      regOp.set_attr<FComputeEx>("FComputeEx<cpu>", forward_cpu_lambda, plevel);
+      regOp.set_attr<FComputeEx>("FComputeEx<gpu>", forward_gpu_lambda, plevel);
 
 Review comment:
   shouldnt we only register this conditionally?
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r370909328
 
 

 ##########
 File path: example/extensions/lib_custom_op/test_relu.py
 ##########
 @@ -0,0 +1,69 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=arguments-differ
+
+# This test checks dynamic loading of custom library into MXNet
+# and checks end to end compute of a simple 2D gemm custom op
+
+import mxnet as mx
+import os
+import time
+
+#load library
+if (os.name=='posix'):
+    path = os.path.abspath('librelu_lib.so')
+    mx.library.load(path)
+
+a = mx.nd.array([[-2,-1],[1,2]], ctx=mx.cpu())
+b = mx.nd.array([[-2,-1],[1,2]], ctx=mx.gpu())
+
+print("--------start ndarray compute---------")
+print(mx.nd.my_relu(a))
+print(mx.nd.my_relu(b))
+print(mx.nd.my_state_relu(a))
+print(mx.nd.my_state_relu(b))
+
+print("--------start symbolic compute--------")
+c = mx.sym.Variable('c')
+d = mx.sym.my_relu(c)
+in_grad = [mx.nd.empty((2,2), ctx=mx.gpu())]
+exe = d.bind(ctx=mx.gpu(), args={'c':b}, args_grad=in_grad)
+out = exe.forward()
+print(out)
+
+print("--------start backward compute--------")
+out_grad = mx.nd.ones((2,2), ctx=mx.gpu())
+exe.backward([out_grad])
+print(in_grad)
+
+print("--------start stress test---------")
+a = mx.nd.uniform(shape=(1000,1000,100), ctx=mx.cpu())
+b = mx.nd.uniform(shape=(1000,1000,100), ctx=mx.gpu())
+t1 = time.time()
+r1 = mx.nd.my_relu(a)
+t2 = time.time()
+r2 = mx.nd.my_relu(b)
+t3 = time.time()
+print("CPU running time:")
+print(t2 - t1)
 
 Review comment:
   You should have `mx.nd.waitall()` somewhere, otherwise it is a wrong measurement.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r366123444
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -720,8 +751,11 @@ int MXLoadLib(const char *path) {
         gradOp.set_attr<bool>("TIsLayerOpBackward", true, plevel);
         gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<cpu>",
                                             fstateful_backward, plevel);
+        gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<gpu>",
 
 Review comment:
   > What is the target supported contexts for this feature? Do we target just cpu and gpu, or we want to support other hardware backends, too?
   
   Yes, lets just target the currently supported contexts. We can swing back around later when we add support dynamic loading of contexts. We'll make sure to implement something generic in the PR (ie. setting context with string rather than enum) so it will just work. 
   
   > Currently the dispatch logic is inside FCompute, which is a bit different from existing mxnet users' experience. Usually the FCompute only declares the computation, and leave the dispatch logic to MXNet executor. And it's unclear how it supports the case where the same op is extended by a library for Intel CPUs and NVIDIA GPUs - they may hard-code the dispatch logic to only care about their own hardware. How do we handle such conflicts?
    
   Good catch. We'll change it so that users can specify context and Forward/Backward function in the registration. But, for custom operators we can only support what the current implementation in MXNet allows. Which is that the top level scope is an operator, and it has implementations for different contexts.
   
   What you're describing is top level being context and inside of that having a distinct operator registration. Like you describe next, this organization is not supported in MXNet. We should discuss this as part of a separate feature enhancement than this PR (the scope of this PR is to add GPU support to custom operators -- only). 
   
   > Furthermore, currently the infer_shape/infer_dtype is not context-aware, i.e. CPU and GPU infers the same dtype. However, it may not be true (e.g. cpu supports fp32 and bfloat16, and gpu supports fp32 and fp16). How do we handle these attribute conflict?
   > 
   > I had a short discussion with @yzhliu and we saw two potential fixes:
   > 
   > 1. make infer_shape/infer_dtype context aware. This way we can have different infer_dtype function for cpu & gpu. MXNet needs to dispatch to the function based on the current context. For example, `op.set_attr<FInferType>("FInferType<cpu>", my_infer_type_function)` for cpu specific type inference, and `op.set_attr<FInferType>("FInferType<gpu>", my_infer_type_function_gpu)`for gpu.
   > 2. Another way is to register ops with different names (e.g. 'cpu_gemm' and 'gpu_gemm'). This way they can have different infer_attr functions. But we don't want users to modify their model definition in the training script to these names. To mitigate that we can have an API to allow user to provide a mapping (e.g. {'gemm' -> 'cpu_gemm'}) for mxnet to map an op to another op registered in the backend.
   
   This is a good idea, we should have a separate PR to add this new feature to MXNet backend, and then extend it out to custom operators. But this is out of scope for this PR. 
   
   Another point is that current custom operator support does not allow for registering CPU implementations in one library, and GPU implementations for the same operator in another. This merging of functionality from different libraries is a good idea for a future feature.
   
   > Finally, is there a plan to support dynamic custom context? :P @samskalicky
   
   I'll add it to the list behind: custom data loaders, graph passes, etc... :D

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367730326
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -215,25 +225,43 @@ struct MXTensor {
   MXTensor() : data_ptr(NULL), dtype(kUNSET), verID(0) {}
 
   MXTensor(void *data_ptr, const std::vector<int64_t> &shape, MXDType dtype,
-           size_t vID)
-  : data_ptr(data_ptr), shape(shape), dtype(dtype), verID(vID) {}
+           size_t vID, MXContext mx_ctx)
+  : data_ptr(data_ptr), shape(shape), dtype(dtype), verID(vID), ctx(mx_ctx) {}
 
   /*! \brief populate internal tensor fields */
-  void setTensor(void *dptr, MXDType type, const int64_t* dims,
-                 int ndims, size_t vID) {
-    data_ptr = dptr; dtype = type; verID = vID;
+  void setTensor(void *dptr, MXDType type, const int64_t* dims, int ndims,
+                 size_t vID, MXContext mx_ctx) {
+    data_ptr = dptr; dtype = type; verID = vID; ctx = mx_ctx;
     shape.clear();
     for (int j = 0; j < ndims; j++) {
       shape.push_back(dims[j]);
     }
-    setDLTensor();
+    DLDeviceType dltype;
+    if (ctx.dev_type == "cpu")
+      dltype = kDLCPU;
+    else if (ctx.dev_type == "gpu")
+      dltype = kDLGPU;
+    else if (ctx.dev_type == "opencl")
+      dltype = kDLOpenCL;
+    else if (ctx.dev_type == "vulcan")
+      dltype = kDLVulkan;
+    else if (ctx.dev_type == "metal")
+      dltype = kDLMetal;
+    else if (ctx.dev_type == "vpi")
+      dltype = kDLVPI;
+    else if (ctx.dev_type == "rocm")
+      dltype = kDLROCM;
+    else
+      dltype = kDLExtDev;
 
 Review comment:
   agree, and for "ext" let's wait for other reviewers

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372640971
 
 

 ##########
 File path: example/extensions/lib_custom_op/test_relu.py
 ##########
 @@ -0,0 +1,83 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=arguments-differ
+
+# This test checks dynamic loading of custom library into MXNet
+# and checks end to end compute of a simple 2D gemm custom op
+
+import mxnet as mx
+import os
+import time
+
+#load library
+if (os.name=='posix'):
+    path = os.path.abspath('librelu_lib.so')
+    mx.library.load(path)
+
+a = mx.nd.array([[-2,-1],[1,2]], ctx=mx.cpu())
+b = mx.nd.array([[-2,-1],[1,2]], ctx=mx.gpu())
+
+print("--------start ndarray compute---------")
+print(mx.nd.my_relu(a))
+print(mx.nd.my_relu(b))
+print(mx.nd.my_state_relu(a))
+print(mx.nd.my_state_relu(b))
+
+print("--------start symbolic compute--------")
+c = mx.sym.Variable('c')
+d = mx.sym.Variable('d')
+e = mx.sym.my_relu(c)
+base = mx.sym.relu(d)
+in_grad = [mx.nd.empty((2,2), ctx=mx.gpu())]
+in_grad_base = [mx.nd.empty((2,2), ctx=mx.gpu())]
+exe = e.bind(ctx=mx.gpu(), args={'c':b}, args_grad=in_grad)
+exe_base = base.bind(ctx=mx.gpu(), args={'d':b}, args_grad=in_grad_base)
+out = exe.forward()
+out_base = exe_base.forward()
+print(out)
+print(out_base)
+
+print("--------start backward compute--------")
+out_grad = mx.nd.ones((2,2), ctx=mx.gpu())
+exe.backward([out_grad])
+exe_base.backward([out_grad])
+print(in_grad)
+print(in_grad_base)
+
+print("--------start testing larger ndarray---------")
+a = mx.nd.uniform(shape=(100,100,100), ctx=mx.cpu())
+b = mx.nd.uniform(shape=(100,100,100), ctx=mx.gpu())
 
 Review comment:
   yeah, and to not time those uniform ops.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369384854
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -328,16 +358,31 @@ typedef void* (*xpu_malloc_t)(void*, int);
  */
 class OpResource {
  public:
-  OpResource(xpu_malloc_t cm, void* ca) : cpu_malloc(cm), cpu_alloc(ca) {}
+  OpResource(xpu_malloc_t cm, void* ca, xpu_malloc_t gm, void* ga, void* st)
+    : cpu_malloc(cm), gpu_malloc(gm), cpu_alloc(ca), gpu_alloc(ga), cuda_stream(st) {}
 
   /*! \brief allocate memory controlled by MXNet */
-  void* alloc(int size) {
+  void* alloc_cpu(int size) {
     return cpu_malloc(cpu_alloc, size);
   }
 
+  /*! \brief allocate memory controlled by MXNet */
+  void* alloc_gpu(int size) {
+    return gpu_malloc(gpu_alloc, size);
+  }
+
+  /*! \brief return the gpu stream object */
+  void* get_cuda_stream() {
+    return cuda_stream;
+  }
+
  private:
-  xpu_malloc_t cpu_malloc;
-  void* cpu_alloc;
+  /*! \brief wrapper to allocation lambda function */
+  xpu_malloc_t cpu_malloc, gpu_malloc;
+  /*! \brief lambda function to return allocated memory handle */
+  void *cpu_alloc, *gpu_alloc;
+  /*! \brief cuda stream passed from MXNet */
+  void *cuda_stream;
 
 Review comment:
   and again here, or do this once up above and use everywhere:
   ```
   #if defined(__NVCC__)
    typedef cudaStream_t mx_stream_t;
   #else
    typedef void* mx_stream_t;
   #endif
   
     mx_stream_t cuda_stream;
   
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369382573
 
 

 ##########
 File path: example/extensions/lib_custom_op/relu_lib.cu
 ##########
 @@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * Copyright (c) 2020 by Contributors
+ * \file relu_lib.cu
+ * \brief simple custom relu operator implemented using CUDA function
+ */
+
+#include <iostream>
+#include "lib_api.h"
+
+__global__ void relu_gpu_forward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? in[tid] : 0;
+}
+
+__global__ void relu_gpu_backward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? 1 : 0;
+}
+
+MXReturnValue forwardCPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? in_data[i] : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue backwardCPU(std::map<std::string, std::string> attrs,
+                          std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? 1 : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue forwardGPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+
+    // test on memory resource allocation
+    void *workspace_cpu = res.alloc_cpu(8 * sizeof(float));
+    void *workspace_gpu = res.alloc_gpu(8 * sizeof(float));
+
+    cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
+    int64_t N = inputs[0].size();
+    int block = 256;
+    int grid = (N + (block - 1)) / block;
+    relu_gpu_forward<<<grid,block,0,cuda_stream>>>(out_data, in_data, N);
+
+    return MX_SUCCESS;
+}
+
+MXReturnValue backwardGPU(std::map<std::string, std::string> attrs,
+                          std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+
+    cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
 
 Review comment:
   should we just have `get_cuda_stream` return a `cudaStream_t` type? We can use `#ifdef __NVCC__` around the code to protect when the user doesnt compile with nvcc (see [reference here](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#nvcc-identification-macro)) and just use `void*` type in the `#else` block.
   
   It will simplify the user's code from:
   ```
   cudaStream_t cuda_stream = reinterpret_cast<cudaStream_t>(res.get_cuda_stream());
   ```
   to
   ```
   cudaStream_t cuda_stream = res.get_cuda_stream();
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367702747
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -635,11 +698,20 @@ int MXLoadLib(const char *path) {
       CHECK(state_op_inst != nullptr)
       << "Error MXNet cannot load custom stateful operator'" << name_str << "'";
 
+      // pass the gpu stream associated with the context to custom library
+      void* gpu_stream = nullptr;
+      if (inputs[i].ctx().dev_mask() == Context::kGPU) {
 
 Review comment:
   what is `i` here for `inputs[i]`? shouldnt this be `inputs[0]`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] eric-haibin-lin commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
eric-haibin-lin commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367199659
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -720,8 +751,11 @@ int MXLoadLib(const char *path) {
         gradOp.set_attr<bool>("TIsLayerOpBackward", true, plevel);
         gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<cpu>",
                                             fstateful_backward, plevel);
+        gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<gpu>",
 
 Review comment:
   Are you in favor of approach 1 or approach 2? They have different indication for the recommended nnvm_registration API for custom ops

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367700687
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -97,9 +97,31 @@ inline int MXAPIGetFunctionRegInfo(const FunRegType *e,
 }
 
 // NOTE: return value is added in API_END
+/*!
+ * \brief Convert NDArray to C type arrays for passing to custom library
+ */
+void NDArrayToCTypeArray(const std::vector<NDArray>& inputs,
 
 Review comment:
   here we're converting a vector of NDArrays to vectors of C types, so shouldnt the name be "NDArrayToCTypes"?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372729337
 
 

 ##########
 File path: example/extensions/lib_custom_op/test_relu.py
 ##########
 @@ -0,0 +1,83 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=arguments-differ
+
+# This test checks dynamic loading of custom library into MXNet
+# and checks end to end compute of a simple 2D gemm custom op
+
+import mxnet as mx
+import os
+import time
+
+#load library
+if (os.name=='posix'):
+    path = os.path.abspath('librelu_lib.so')
+    mx.library.load(path)
+
+a = mx.nd.array([[-2,-1],[1,2]], ctx=mx.cpu())
+b = mx.nd.array([[-2,-1],[1,2]], ctx=mx.gpu())
+
+print("--------start ndarray compute---------")
+print(mx.nd.my_relu(a))
+print(mx.nd.my_relu(b))
+print(mx.nd.my_state_relu(a))
+print(mx.nd.my_state_relu(b))
+
+print("--------start symbolic compute--------")
+c = mx.sym.Variable('c')
+d = mx.sym.Variable('d')
+e = mx.sym.my_relu(c)
+base = mx.sym.relu(d)
+in_grad = [mx.nd.empty((2,2), ctx=mx.gpu())]
+in_grad_base = [mx.nd.empty((2,2), ctx=mx.gpu())]
+exe = e.bind(ctx=mx.gpu(), args={'c':b}, args_grad=in_grad)
+exe_base = base.bind(ctx=mx.gpu(), args={'d':b}, args_grad=in_grad_base)
+out = exe.forward()
+out_base = exe_base.forward()
+print(out)
+print(out_base)
+
+print("--------start backward compute--------")
+out_grad = mx.nd.ones((2,2), ctx=mx.gpu())
+exe.backward([out_grad])
+exe_base.backward([out_grad])
+print(in_grad)
+print(in_grad_base)
+
+print("--------start testing larger ndarray---------")
+a = mx.nd.uniform(shape=(100,100,100), ctx=mx.cpu())
+b = mx.nd.uniform(shape=(100,100,100), ctx=mx.gpu())
 
 Review comment:
   thanks for approving the PR. I am thinking if we can merge the PR now, since we will do more benchmark tests in the coming PRs and it is a small example, so it's not worth the CI run.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367730341
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -566,16 +609,20 @@ typedef MXReturnValue (*createOpState_t)(std::map<std::string, std::string>,
 class CustomOp {
  public:
   explicit CustomOp(const char* op_name) : name(op_name),
-    forward(NULL), backward(NULL), parse_attrs(NULL), infer_type(NULL),
-    infer_shape(NULL), mutate_inputs(NULL), create_opstate(NULL),
-    isSGop(false) {}
-  ~CustomOp() {}
-  CustomOp& setForward(fcomp_t fcomp) {
-    forward = fcomp;
+    parse_attrs(NULL), infer_type(NULL), infer_shape(NULL), mutate_inputs(NULL),
+    create_opstate(NULL), isSGop(false) {}
+  CustomOp& setForward(fcomp_t fcomp, std::string ctx) {
+    char* cstr = new char[ctx.length()+1];
+    strncpy(cstr, ctx.c_str(), ctx.length()+1);
+    forward_ctx_cstr.push_back(cstr);
 
 Review comment:
   agree

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky edited a comment on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky edited a comment on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-578608204
 
 
   @rondogency if you're going to add a new library for the GPU tests you need to modify: CMakeLists.txt, Jenkins_steps.groovy. In the CI cmake is used, and you need to store the library between build/test stages of the CI by modifying the groovy file. 
   
   See the subgraph property PR modifications for these files that add `libsubgraph_lib.so`:
   https://github.com/apache/incubator-mxnet/pull/17034/files
   
   At least the addition of the `__attribute__ ((visibility ("hidden")))` fixed the multiple library symbol problem:
   ```
   test_extensions.test_custom_op ... [02:06:10] src/c_api/c_api.cc:286: Found 2 operators in library
   [02:06:10] src/c_api/c_api.cc:350: 	Op[0] my_gemm
   [02:06:10] src/c_api/c_api.cc:350: 	Op[1] state_gemm
   [02:06:10] src/c_api/c_api.cc:785: Found 0 partitioners in library
   Info: stateful operator created
   Info: keyword + number of forward: 1
   Info: keyword + number of forward: 2
   ok (0.5803s)
   test_extensions.test_custom_op_gpu ... SKIP: ignoring custom_op_gpu test on cpu run
   MXNet version 10600 supported
   test_extensions.test_subgraph ... [02:06:11] src/c_api/c_api.cc:286: Found 1 operators in library
   [02:06:11] src/c_api/c_api.cc:350: 	Op[0] _custom_subgraph_op
   [02:06:11] src/c_api/c_api.cc:785: Found 1 partitioners in library
   [02:06:11] src/c_api/c_api.cc:801: 	Partitioner[0] myProp
   [02:06:11] src/c_api/c_api.cc:821: 		Strategy[0] strategy1 subgraphOp: '_custom_subgraph_op'
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369388924
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -685,28 +674,54 @@ int MXLoadLib(const char *path) {
       using namespace mxnet::op;
       regOp.set_num_inputs(DefaultSubgraphOpNumInputs);
       regOp.set_num_outputs(DefaultSubgraphOpNumOutputs);
-      regOp.set_attr<nnvm::FInferType>("FInferType",
-                                       DefaultSubgraphOpType, plevel);
-      regOp.set_attr<mxnet::FInferShape>("FInferShape",
-                                         DefaultSubgraphOpShape, plevel);
+      regOp.set_attr<nnvm::FInferType>("FInferType", DefaultSubgraphOpType, plevel);
+      regOp.set_attr<mxnet::FInferShape>("FInferShape", DefaultSubgraphOpShape, plevel);
       regOp.set_attr<FInferStorageType>("FInferStorageType",
                                         DefaultSubgraphOpStorageType, plevel);
       regOp.set_attr<FResourceRequest>("FResourceRequest",
                                        DefaultSubgraphOpResourceRequest, plevel);
       regOp.set_attr<nnvm::FMutateInputs>("FMutateInputs",
                                           DefaultSubgraphOpMutableInputs, plevel);
     }
-
     // optionally add stateful forward
-    if (create_opstate_fp != nullptr) {
+    if (createop_map.size() != 0) {
       regOp.set_attr<FCreateOpState>("FCreateOpState", create_opstate, plevel);
-      regOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<cpu>",
-                                        fstateful_forward, plevel);
+      auto fstate_forward = [=](const OpStatePtr& state_ptr,
+                                const OpContext& ctx,
+                                const std::vector<NDArray>& inputs,
+                                const std::vector<OpReqType>& req,
+                                const std::vector<NDArray>& outputs) {
+        CustomFComputeDispatcher(name_str, nullptr, nullptr, nullptr,
+                                 callFStatefulComp, 1, &state_ptr, ctx, inputs, req, outputs);
+      };
+      regOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<cpu>", fstate_forward, plevel);
+      regOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<gpu>", fstate_forward, plevel);
     } else {
-      regOp.set_attr<FComputeEx>("FComputeEx<cpu>", forward_lambda, plevel);
+      if (forward_ctx_map.count("cpu") > 0) {
+        auto forward_cpu_lambda = [=](const nnvm::NodeAttrs& attrs,
+                                      const OpContext& ctx,
+                                      const std::vector<NDArray>& inputs,
+                                      const std::vector<OpReqType>& req,
+                                      const std::vector<NDArray>& outputs) {
+          CustomFComputeDispatcher(name_str, callFComp, forward_ctx_map.at("cpu"), &attrs,
+                                   nullptr, 0, nullptr, ctx, inputs, req, outputs);
+        };
 
 Review comment:
   I think we should call `forward_ctx_map.at("cpu")` outside of the lambda and just use the `fcomp_t` otherwise it will capture the whole `forward_ctx_map` map

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369387057
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -99,7 +99,135 @@ inline int MXAPIGetFunctionRegInfo(const FunRegType *e,
 // NOTE: return value is added in API_END
 
 /*!
- * \brief Loads dynamic library and initializes it
+ * \brief Common compute function dispatcher for forward/backward and stateful forward/backward
+ * state_ptr will be nullptr for regular ops; fcomp_fp is nullptr for stateful ops
+ */
+void CustomFComputeDispatcher(const std::string op_name,
+                              const opCallFComp_t callFComp,
+                              const fcomp_t fcomp_fp,
+                              const nnvm::NodeAttrs* attrs,
+                              const opCallFStatefulComp_t callFStatefulComp,
+                              int stateful_forward_flag,
+                              const OpStatePtr* state_ptr,
+                              const OpContext& ctx,
+                              const std::vector<NDArray>& inputs,
+                              const std::vector<OpReqType>& req,
+                              const std::vector<NDArray>& outputs) {
+  std::vector<void*> in_data, out_data;
+  std::vector<const int64_t *> in_shapes, out_shapes;
+  std::vector<int> in_dims, out_dims;
+  std::vector<int> in_types, out_types;
+  std::vector<size_t> in_verIDs, out_verIDs;
+  std::vector<const char*> in_dev_type, out_dev_type;
+  std::vector<int> in_dev_id, out_dev_id;
+
+  // convert inputs/outpus NDArray to C types to be passed to lib_api.h
+  for (size_t i = 0; i < inputs.size(); i++) {
+    in_data.push_back(inputs[i].data().dptr_);
+    in_shapes.push_back(inputs[i].shape().data());
+    in_dims.push_back(inputs[i].shape().ndim());
+    in_types.push_back(inputs[i].dtype());
+    in_verIDs.push_back(inputs[i].version());
+    const char* ctx_str = inputs[i].ctx().dev_mask() == Context::kCPU ? "cpu" : "gpu";
+    in_dev_type.push_back(ctx_str);
+    in_dev_id.push_back(inputs[i].ctx().real_dev_id());
+  }
+
+  for (size_t i = 0; i < outputs.size(); i++) {
+    out_data.push_back(outputs[i].data().dptr_);
+    out_shapes.push_back(outputs[i].shape().data());
+    out_dims.push_back(outputs[i].shape().ndim());
+    out_types.push_back(outputs[i].dtype());
+    out_verIDs.push_back(outputs[i].version());
+    const char* ctx_str = outputs[i].ctx().dev_mask() == Context::kCPU ? "cpu" : "gpu";
+    out_dev_type.push_back(ctx_str);
+    out_dev_id.push_back(outputs[i].ctx().real_dev_id());
+  }
+
+  // get memory resource and mxnet backend streams
+  const Resource &resource = ctx.requested[0];
+  mshadow::Stream<mxnet::cpu> *cpu_stream = ctx.get_stream<mxnet::cpu>();
+  mshadow::Stream<mxnet::gpu> *gpu_stream = ctx.get_stream<mxnet::gpu>();
+
+  // create lambda that captures stream & resource objects
+  // this temp workspace holds memory allocated by custom library via OpResource
+  auto cpu_alloc = [&](int size) {
+    mshadow::Tensor<mxnet::cpu, 1, char> workspace =
+      resource.get_space_typed<mxnet::cpu, 1, char>(mshadow::Shape1(size), cpu_stream);
+    return workspace.dptr_;
+  };
+  auto gpu_alloc = [&](int size) {
+    mshadow::Tensor<mxnet::gpu, 1, char> workspace =
+      resource.get_space_typed<mxnet::gpu, 1, char>(mshadow::Shape1(size), gpu_stream);
+    return workspace.dptr_;
+  };
+
+  // create lambda without captures so that we can cast it to function pointer
+  // this needs to be a lambda function so that we can do the decltype cast
+  typedef decltype(cpu_alloc) alloc_type_cpu;
+  auto cpu_malloc = [](void* _cpu_alloc, int size) {
+    // cast the void* argument to the type for the cpu_alloc lambda function
+    alloc_type_cpu* cpualloc = static_cast<alloc_type_cpu*>(_cpu_alloc);
+    // call cpu_alloc to actually allocate memory and get the pointer
+    void* ptr = (*cpualloc)(size);
+    return ptr;
+  };
+  typedef decltype(gpu_alloc) alloc_type_gpu;
+  auto gpu_malloc = [](void* _gpu_alloc, int size) {
+    alloc_type_gpu* gpualloc = static_cast<alloc_type_gpu*>(_gpu_alloc);
+    void* ptr = (*gpualloc)(size);
+    return ptr;
+  };
+
+  // get actual cudaStream_t out of mxnet gpu stream and pass to lib_api.h
+  void *cuda_stream = nullptr;
+  if (inputs[0].ctx().dev_mask() == Context::kGPU) {
+    cuda_stream = static_cast<void*>(mshadow::Stream<gpu>::GetStream(gpu_stream));
+  }
+
+  CHECK((fcomp_fp != nullptr && state_ptr == nullptr)
+        || (fcomp_fp == nullptr && state_ptr != nullptr))
+  << "Can only register either regular op or stateful op for '" << op_name << "'";
+
+  if (fcomp_fp != nullptr) {
+    // convert attributes to vector of char*
+    std::vector<const char*> attr_keys, attr_vals;
+    for (auto kv : attrs->dict) {
+      attr_keys.push_back(kv.first.c_str());
+      attr_vals.push_back(kv.second.c_str());
+    }
+    // call fcompute function
+    CHECK(callFComp(fcomp_fp, attr_keys.data(), attr_vals.data(), attr_keys.size(),
+                    in_shapes.data(), in_dims.data(), in_data.data(), in_types.data(),
+                    in_verIDs.data(), in_dev_type.data(), in_dev_id.data(), in_data.size(),
+                    out_shapes.data(), out_dims.data(), out_data.data(), out_types.data(),
+                    out_verIDs.data(), out_dev_type.data(), out_dev_id.data(), out_data.size(),
+                    cpu_malloc, &cpu_alloc, gpu_malloc, &gpu_alloc, cuda_stream))
+    << "Error calling FCompute for custom operator '" << op_name << "'";
 
 Review comment:
   should we make this unique for the stateful operator? like: "Error calling FCompute for custom stateful operator '" << op_name << "'"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367251499
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -720,8 +751,11 @@ int MXLoadLib(const char *path) {
         gradOp.set_attr<bool>("TIsLayerOpBackward", true, plevel);
         gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<cpu>",
                                             fstateful_backward, plevel);
+        gradOp.set_attr<FStatefulComputeEx>("FStatefulComputeEx<gpu>",
 
 Review comment:
   I think that I support approach 1. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372601707
 
 

 ##########
 File path: Makefile
 ##########
 @@ -664,11 +664,19 @@ cpplint:
 pylint:
 	python3 -m pylint --rcfile=$(ROOTDIR)/ci/other/pylintrc --ignore-patterns=".*\.so$$,.*\.dll$$,.*\.dylib$$" python/mxnet
 
-# sample lib for MXNet extension dynamically loading custom operator
-sample_lib:
-	$(CXX) -shared -fPIC -std=c++11 example/extensions/lib_custom_op/gemm_lib.cc -o libsample_lib.so -I include/mxnet
+# MXNet extension dynamically loading libraries
+EXT_LIBS = custom_op_lib subgraph_lib
+ifeq ($(USE_CUDA), 1)
+	EXT_LIBS += custom_op_gpu_lib
+endif
+extension_libs: $(EXT_LIBS)
+
+custom_op_lib:
+	$(CXX) -shared -fPIC -std=c++11 example/extensions/lib_custom_op/gemm_lib.cc -o build/libcustomop_lib.so -I include/mxnet
+custom_op_gpu_lib:
+	$(NVCC) -shared -std=c++11 -Xcompiler -fPIC example/extensions/lib_custom_op/relu_lib.cu -o build/libcustomop_gpu_lib.so -I include/mxnet
 
 Review comment:
   Why don't you use the NVCCFLAGS and CUDA_ARCH flags here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372733802
 
 

 ##########
 File path: example/extensions/lib_custom_op/test_relu.py
 ##########
 @@ -0,0 +1,83 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=arguments-differ
+
+# This test checks dynamic loading of custom library into MXNet
+# and checks end to end compute of a simple 2D gemm custom op
+
+import mxnet as mx
+import os
+import time
+
+#load library
+if (os.name=='posix'):
+    path = os.path.abspath('librelu_lib.so')
+    mx.library.load(path)
+
+a = mx.nd.array([[-2,-1],[1,2]], ctx=mx.cpu())
+b = mx.nd.array([[-2,-1],[1,2]], ctx=mx.gpu())
+
+print("--------start ndarray compute---------")
+print(mx.nd.my_relu(a))
+print(mx.nd.my_relu(b))
+print(mx.nd.my_state_relu(a))
+print(mx.nd.my_state_relu(b))
+
+print("--------start symbolic compute--------")
+c = mx.sym.Variable('c')
+d = mx.sym.Variable('d')
+e = mx.sym.my_relu(c)
+base = mx.sym.relu(d)
+in_grad = [mx.nd.empty((2,2), ctx=mx.gpu())]
+in_grad_base = [mx.nd.empty((2,2), ctx=mx.gpu())]
+exe = e.bind(ctx=mx.gpu(), args={'c':b}, args_grad=in_grad)
+exe_base = base.bind(ctx=mx.gpu(), args={'d':b}, args_grad=in_grad_base)
+out = exe.forward()
+out_base = exe_base.forward()
+print(out)
+print(out_base)
+
+print("--------start backward compute--------")
+out_grad = mx.nd.ones((2,2), ctx=mx.gpu())
+exe.backward([out_grad])
+exe_base.backward([out_grad])
+print(in_grad)
+print(in_grad_base)
+
+print("--------start testing larger ndarray---------")
+a = mx.nd.uniform(shape=(100,100,100), ctx=mx.cpu())
+b = mx.nd.uniform(shape=(100,100,100), ctx=mx.gpu())
 
 Review comment:
   @wkcn if you agree would you please merge this PR?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367697443
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -566,16 +609,20 @@ typedef MXReturnValue (*createOpState_t)(std::map<std::string, std::string>,
 class CustomOp {
  public:
   explicit CustomOp(const char* op_name) : name(op_name),
-    forward(NULL), backward(NULL), parse_attrs(NULL), infer_type(NULL),
-    infer_shape(NULL), mutate_inputs(NULL), create_opstate(NULL),
-    isSGop(false) {}
-  ~CustomOp() {}
-  CustomOp& setForward(fcomp_t fcomp) {
-    forward = fcomp;
+    parse_attrs(NULL), infer_type(NULL), infer_shape(NULL), mutate_inputs(NULL),
+    create_opstate(NULL), isSGop(false) {}
+  CustomOp& setForward(fcomp_t fcomp, std::string ctx) {
+    char* cstr = new char[ctx.length()+1];
+    strncpy(cstr, ctx.c_str(), ctx.length()+1);
+    forward_ctx_cstr.push_back(cstr);
 
 Review comment:
   i think we should use `const char*` here like we do for the `op_name`. This function is being called statically in the global scope.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369155919
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -563,11 +587,21 @@ int MXLoadLib(const char *path) {
       }
 
       // create a pointer to hold custom op state object
+      // only create one stateful op depending on passing context
+      // user can add new supported context and call to custom library
       void* state_op_inst = nullptr;
-      CHECK(callCreateOpState(create_opstate_fp, attr_keys.data(), attr_vals.data(),
-                              attr_keys.size(), &state_op_inst))
-      << "Error calling CreateOpState for custom operator '" << name_str << "'";
-
+      if (ctx.dev_mask() == Context::kCPU) {
+        CHECK(createop_map.count("cpu") > 0) << "CPU CreateOpState not implemented";
 
 Review comment:
   this is good, but can we made the error message more clear by adding the operator name at the end so the message is more like: 
   > CPU CreateOpState is not implemented for operator zzzzz

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r373278868
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -594,26 +657,58 @@ class CustomOp {
     mutate_inputs = func;
     return *this;
   }
-  CustomOp& setCreateOpState(createOpState_t func) {
-    create_opstate = func;
+  CustomOp& setCreateOpState(createOpState_t func, const char* ctx) {
+    if (create_op_ctx_map.count(ctx) > 0)
+      raiseDuplicateContextError();
+    create_op_ctx_map[ctx] = func;
     return *this;
   }
   CustomOp& setIsSubgraphOp() {
     isSGop = true;
     return *this;
   }
+  void mapToVector() {
+    for (auto kv : forward_ctx_map) {
+      forward_ctx_cstr.push_back(kv.first);
+      forward_fp.push_back(kv.second);
+    }
+    for (auto kv : backward_ctx_map) {
+      backward_ctx_cstr.push_back(kv.first);
+      backward_fp.push_back(kv.second);
+    }
+    for (auto kv : create_op_ctx_map) {
+      create_op_ctx_cstr.push_back(kv.first);
+      create_op_fp.push_back(kv.second);
+    }
+  }
+  ~CustomOp() {}
 
   /*! \brief operator name */
   const char* name;
+
   /*! \brief operator functions */
-  fcomp_t forward;
-  fcomp_t backward;
   parseAttrs_t parse_attrs;
   inferType_t infer_type;
   inferShape_t infer_shape;
   mutateInputs_t mutate_inputs;
-  createOpState_t create_opstate;
   bool isSGop;
+
+  /*! \brief vector repr of ctx map to be easily loaded from c_api */
+  std::vector<const char*> forward_ctx_cstr, backward_ctx_cstr, create_op_ctx_cstr;
+  std::vector<fcomp_t> forward_fp, backward_fp;
+  std::vector<createOpState_t> create_op_fp;
+
+ private:
+  void raiseDuplicateContextError() {
+    std::string op_name_str(name);
+    throw std::runtime_error(
+      "Error! Error! Cannot register multiple functions under same context for operator '"
+      + op_name_str + "'");
+  }
+
+  /*! \brief dedup context maps - static string ctx to custom function */
+  std::unordered_map<const char*, fcomp_t> forward_ctx_map, backward_ctx_map;
 
 Review comment:
   This map is used for dedup inside each library, and if loading multiple libraries it would be different operators, since for each custom library there will be one registry object created and custom operators are registered individually in each library.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367730693
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -602,12 +649,18 @@ class CustomOp {
     isSGop = true;
     return *this;
   }
+  ~CustomOp() {}
 
   /*! \brief operator name */
   const char* name;
+
+  /*! \brief each fcomp function is associated with a context speficied by a string*/
+  std::vector<const char*> forward_ctx_cstr;
+  std::vector<fcomp_t> forward_fp;
+  std::vector<const char*> backward_ctx_cstr;
 
 Review comment:
   I agree with map can dedup custom registration, but opRegGet needs to pass a continuous memory to c_api, so let's shelve it now

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369133681
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -740,68 +787,81 @@ class Registry {
 typedef int (*opRegSize_t)(void);
 
 #define MXLIB_OPREGGET_STR "_opRegGet"
-typedef int (*opRegGet_t)(int, const char**, fcomp_t*, fcomp_t*,
-                          parseAttrs_t*, inferType_t*,
-                          inferShape_t*, mutateInputs_t*,
-                          createOpState_t*, int*);
+typedef int (*opRegGet_t)(int idx, const char** name, int *isSGop,
+                          const char*** forward_ctx, fcomp_t** forward_fp, int* forward_count,
+                          const char*** backward_ctx, fcomp_t** backward_fp, int* backward_count,
+                          const char*** create_op_ctx, createOpState_t** create_op_fp, int* create_op_count,
+                          parseAttrs_t* parse, inferType_t* type,
+                          inferShape_t* shape, mutateInputs_t* mutate);
 
 #define MXLIB_OPCALLFREE_STR "_opCallFree"
-typedef int (*opCallFree_t)(void*);
+typedef int (*opCallFree_t)(void* ptr);
 
 #define MXLIB_OPCALLPARSEATTRS_STR "_opCallParseAttrs"
-typedef int (*opCallParseAttrs_t)(parseAttrs_t, const char* const*, const char* const*, int,
-                                  int*, int*);
+typedef int (*opCallParseAttrs_t)(parseAttrs_t parseAttrs, const char* const* keys,
+                                  const char* const* vals, int num,
+                                  int* num_in, int* num_out);
 
 #define MXLIB_OPCALLINFERSHAPE_STR "_opCallInferShape"
-typedef int (*opCallInferShape_t)(inferShape_t, const char* const*, const char* const*, int,
-                                  unsigned int**, int*, int,
-                                  unsigned int***, int**, int);
+typedef int (*opCallInferShape_t)(inferShape_t inferShape, const char* const* keys,
+                                  const char* const* vals, int num,
+                                  unsigned int** inshapes, int* indims, int num_in,
+                                  unsigned int*** outshapes, int** outdims, int num_out);
 
 #define MXLIB_OPCALLINFERTYPE_STR "_opCallInferType"
-typedef int (*opCallInferType_t)(inferType_t, const char* const*, const char* const*, int,
-                                  int*, int, int*, int);
+typedef int (*opCallInferType_t)(inferType_t inferType, const char* const* keys,
+                                 const char* const* vals, int num,
+                                 int* intypes, int num_in, int* outtypes, int num_out);
 
 #define MXLIB_OPCALLFCOMP_STR "_opCallFCompute"
-typedef int (*opCallFComp_t)(fcomp_t, const char* const*, const char* const*, int,
-                             const int64_t**, int*, void**, int*, size_t*, int,
-                             const int64_t**, int*, void**, int*, size_t*, int,
-                             xpu_malloc_t, void*);
+typedef int (*opCallFComp_t)(fcomp_t fcomp, const char* const* keys, const char* const* vals, int num,
+                             const int64_t** inshapes, int* indims, void** indata, int* intypes,
+                             size_t* inIDs, const char** indev_type, int* indev_id, int num_in,
+                             const int64_t** outshapes, int* outdims, void** outdata, int* outtypes,
+                             size_t* outIDs, const char** outdev_type, int* outdev_id, int num_out,
+                             xpu_malloc_t cpu_malloc, void* cpu_alloc, void* stream);
 
 #define MXLIB_OPCALLMUTATEINPUTS_STR "_opCallMutateInputs"
-typedef int (*opCallMutateInputs_t)(mutateInputs_t, const char* const*, const char* const*, int,
-                                    int**, int*);
+typedef int (*opCallMutateInputs_t)(mutateInputs_t mutate, const char* const* keys,
+                                    const char* const* vals, int num,
+                                    int** mutate_indices, int* indices_size);
 
 #define MXLIB_OPCALLCREATEOPSTATE_STR "_opCallCreateOpState"
-typedef int (*opCallCreateOpState_t)(createOpState_t, const char* const*, const char* const*, int,
-                                     void**);
+typedef int (*opCallCreateOpState_t)(createOpState_t create_op, const char* const* keys,
+                                     const char* const* vals, int num,
+                                     void** state_op);
 
 #define MXLIB_OPCALLFSTATEFULCOMP_STR "_opCallFStatefulCompute"
-typedef int (*opCallFStatefulComp_t)(int, void*, const int64_t**, int*, void**, int*, size_t*,
-                                     int, const int64_t**, int*, void**, int*, size_t*,
-                                     int, xpu_malloc_t, void*);
+typedef int (*opCallFStatefulComp_t)(int is_forward, void* state_op,
+                                     const int64_t** inshapes, int* indims, void** indata, int* intypes,
+                                     size_t* inIDs, const char** indev_type, int* indev_id, int num_in,
+                                     const int64_t** outshapes, int* outdims, void** outdata, int* outtypes,
+                                     size_t* outIDs, const char** outdev_type, int* outdev_id, int num_out,
+                                     xpu_malloc_t cpu_malloc, void* cpu_alloc, void* stream);
 
 #define MXLIB_PARTREGSIZE_STR "_partRegSize"
 typedef int (*partRegSize_t)(void);
 
 #define MXLIB_PARTREGGETCOUNT_STR "_partRegGetCount"
-typedef int (*partRegGetCount_t)(int, const char**);
+typedef int (*partRegGetCount_t)(int idx, const char** name);
 
 #define MXLIB_PARTREGGET_STR "_partRegGet"
-typedef void (*partRegGet_t)(int, int, const char**, supportedOps_t*,
-                            acceptSubgraph_t*, const char**);
+typedef void (*partRegGet_t)(int part_idx, int stg_idx, const char** strategy, supportedOps_t* supportedOps,
+                             acceptSubgraph_t* acceptSubgraph, const char** op_name);
 
 #define MXLIB_PARTCALLSUPPORTEDOPS_STR "_partCallSupportedOps"
-typedef int (*partCallSupportedOps_t)(supportedOps_t, const char*, int, int *,
-                                      const char* const*, const char* const*, int);
+typedef int (*partCallSupportedOps_t)(supportedOps_t supportedOps, const char *json,
+                                      int num_ids, int *ids, const char* const* opt_keys,
+                                      const char* const* opt_vals, int num_opts);
+
 #define MXLIB_PARTCALLACCEPTSUBGRAPH_STR "_partCallAcceptSubgraph"
-typedef int (*partCallAcceptSubgraph_t)(acceptSubgraph_t acceptSubgraph,
-                                        const char *json, int subgraph_id,
-                                        int *accept, const char* const*,
-                                        const char* const*, int,
-                                        char***, char***, int*);
+typedef int (*partCallAcceptSubgraph_t)(acceptSubgraph_t acceptSubgraph, const char *json,
+                                        int subgraph_id, int *accept, const char* const* opt_keys,
+                                        const char* const* opt_vals, int num_opts,
+                                        char*** attr_keys, char*** attr_vals, int *num_attrs);
 
 #define MXLIB_INITIALIZE_STR "initialize"
-typedef int (*initialize_t)(int);
+typedef int (*initialize_t)(int version);
 
 Review comment:
   thanks for putting all the names, this makes the code much more readable/maintainable

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369155919
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -563,11 +587,21 @@ int MXLoadLib(const char *path) {
       }
 
       // create a pointer to hold custom op state object
+      // only create one stateful op depending on passing context
+      // user can add new supported context and call to custom library
       void* state_op_inst = nullptr;
-      CHECK(callCreateOpState(create_opstate_fp, attr_keys.data(), attr_vals.data(),
-                              attr_keys.size(), &state_op_inst))
-      << "Error calling CreateOpState for custom operator '" << name_str << "'";
-
+      if (ctx.dev_mask() == Context::kCPU) {
+        CHECK(createop_map.count("cpu") > 0) << "CPU CreateOpState not implemented";
 
 Review comment:
   this is good, but can we made the error message more clear by adding the operator name at the end so the message is more like: CPU CreateOpState is not implemented for operator zzzzz

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r373286058
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -594,26 +657,58 @@ class CustomOp {
     mutate_inputs = func;
     return *this;
   }
-  CustomOp& setCreateOpState(createOpState_t func) {
-    create_opstate = func;
+  CustomOp& setCreateOpState(createOpState_t func, const char* ctx) {
+    if (create_op_ctx_map.count(ctx) > 0)
+      raiseDuplicateContextError();
+    create_op_ctx_map[ctx] = func;
     return *this;
   }
   CustomOp& setIsSubgraphOp() {
     isSGop = true;
     return *this;
   }
+  void mapToVector() {
+    for (auto kv : forward_ctx_map) {
+      forward_ctx_cstr.push_back(kv.first);
+      forward_fp.push_back(kv.second);
+    }
+    for (auto kv : backward_ctx_map) {
+      backward_ctx_cstr.push_back(kv.first);
+      backward_fp.push_back(kv.second);
+    }
+    for (auto kv : create_op_ctx_map) {
+      create_op_ctx_cstr.push_back(kv.first);
+      create_op_fp.push_back(kv.second);
+    }
+  }
+  ~CustomOp() {}
 
   /*! \brief operator name */
   const char* name;
+
   /*! \brief operator functions */
-  fcomp_t forward;
-  fcomp_t backward;
   parseAttrs_t parse_attrs;
   inferType_t infer_type;
   inferShape_t infer_shape;
   mutateInputs_t mutate_inputs;
-  createOpState_t create_opstate;
   bool isSGop;
+
+  /*! \brief vector repr of ctx map to be easily loaded from c_api */
+  std::vector<const char*> forward_ctx_cstr, backward_ctx_cstr, create_op_ctx_cstr;
+  std::vector<fcomp_t> forward_fp, backward_fp;
+  std::vector<createOpState_t> create_op_fp;
+
+ private:
+  void raiseDuplicateContextError() {
+    std::string op_name_str(name);
+    throw std::runtime_error(
+      "Error! Error! Cannot register multiple functions under same context for operator '"
+      + op_name_str + "'");
+  }
+
+  /*! \brief dedup context maps - static string ctx to custom function */
+  std::unordered_map<const char*, fcomp_t> forward_ctx_map, backward_ctx_map;
 
 Review comment:
   Thank you! I see.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367711041
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -763,9 +818,9 @@ typedef int (*opCallInferType_t)(inferType_t, const char* const*, const char* co
 
 #define MXLIB_OPCALLFCOMP_STR "_opCallFCompute"
 typedef int (*opCallFComp_t)(fcomp_t, const char* const*, const char* const*, int,
-                             const int64_t**, int*, void**, int*, size_t*, int,
-                             const int64_t**, int*, void**, int*, size_t*, int,
-                             xpu_malloc_t, void*);
+                             const int64_t**, int*, void**, int*, size_t*, char**, int*, int,
+                             const int64_t**, int*, void**, int*, size_t*, char**, int*, int,
+                             xpu_malloc_t, void*, void*);
 
 Review comment:
   lets refactor and use the same function signature as the actual function later on

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369381816
 
 

 ##########
 File path: example/extensions/lib_custom_op/relu_lib.cu
 ##########
 @@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * Copyright (c) 2020 by Contributors
+ * \file relu_lib.cu
+ * \brief simple custom relu operator implemented using CUDA function
+ */
+
+#include <iostream>
+#include "lib_api.h"
+
+__global__ void relu_gpu_forward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? in[tid] : 0;
+}
+
+__global__ void relu_gpu_backward(float *out, float *in, int64_t N) {
+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    if (tid < N)
+        out[tid] = in[tid] > 0 ? 1 : 0;
+}
+
+MXReturnValue forwardCPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? in_data[i] : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue backwardCPU(std::map<std::string, std::string> attrs,
+                          std::vector<MXTensor> inputs,
+                          std::vector<MXTensor> outputs,
+                          OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+    for (int i=0; i<inputs[0].size(); i++) {
+        out_data[i] = in_data[i] > 0 ? 1 : 0;
+    }
+    return MX_SUCCESS;
+}
+
+MXReturnValue forwardGPU(std::map<std::string, std::string> attrs,
+                         std::vector<MXTensor> inputs,
+                         std::vector<MXTensor> outputs,
+                         OpResource res) {
+    float* in_data = inputs[0].data<float>();
+    float* out_data = outputs[0].data<float>();
+
+    // test on memory resource allocation
+    void *workspace_cpu = res.alloc_cpu(8 * sizeof(float));
+    void *workspace_gpu = res.alloc_gpu(8 * sizeof(float));
 
 Review comment:
   its great that you're testing calling these functions, but you dont do anything with the allocated space. lets either use it for something, or not alloc at all. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369387915
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -99,7 +99,135 @@ inline int MXAPIGetFunctionRegInfo(const FunRegType *e,
 // NOTE: return value is added in API_END
 
 /*!
- * \brief Loads dynamic library and initializes it
+ * \brief Common compute function dispatcher for forward/backward and stateful forward/backward
+ * state_ptr will be nullptr for regular ops; fcomp_fp is nullptr for stateful ops
+ */
+void CustomFComputeDispatcher(const std::string op_name,
+                              const opCallFComp_t callFComp,
+                              const fcomp_t fcomp_fp,
+                              const nnvm::NodeAttrs* attrs,
+                              const opCallFStatefulComp_t callFStatefulComp,
+                              int stateful_forward_flag,
+                              const OpStatePtr* state_ptr,
+                              const OpContext& ctx,
+                              const std::vector<NDArray>& inputs,
+                              const std::vector<OpReqType>& req,
+                              const std::vector<NDArray>& outputs) {
+  std::vector<void*> in_data, out_data;
+  std::vector<const int64_t *> in_shapes, out_shapes;
+  std::vector<int> in_dims, out_dims;
+  std::vector<int> in_types, out_types;
+  std::vector<size_t> in_verIDs, out_verIDs;
+  std::vector<const char*> in_dev_type, out_dev_type;
+  std::vector<int> in_dev_id, out_dev_id;
+
+  // convert inputs/outpus NDArray to C types to be passed to lib_api.h
+  for (size_t i = 0; i < inputs.size(); i++) {
+    in_data.push_back(inputs[i].data().dptr_);
+    in_shapes.push_back(inputs[i].shape().data());
+    in_dims.push_back(inputs[i].shape().ndim());
+    in_types.push_back(inputs[i].dtype());
+    in_verIDs.push_back(inputs[i].version());
+    const char* ctx_str = inputs[i].ctx().dev_mask() == Context::kCPU ? "cpu" : "gpu";
+    in_dev_type.push_back(ctx_str);
+    in_dev_id.push_back(inputs[i].ctx().real_dev_id());
+  }
+
+  for (size_t i = 0; i < outputs.size(); i++) {
+    out_data.push_back(outputs[i].data().dptr_);
+    out_shapes.push_back(outputs[i].shape().data());
+    out_dims.push_back(outputs[i].shape().ndim());
+    out_types.push_back(outputs[i].dtype());
+    out_verIDs.push_back(outputs[i].version());
+    const char* ctx_str = outputs[i].ctx().dev_mask() == Context::kCPU ? "cpu" : "gpu";
+    out_dev_type.push_back(ctx_str);
+    out_dev_id.push_back(outputs[i].ctx().real_dev_id());
+  }
+
+  // get memory resource and mxnet backend streams
+  const Resource &resource = ctx.requested[0];
+  mshadow::Stream<mxnet::cpu> *cpu_stream = ctx.get_stream<mxnet::cpu>();
+  mshadow::Stream<mxnet::gpu> *gpu_stream = ctx.get_stream<mxnet::gpu>();
+
+  // create lambda that captures stream & resource objects
+  // this temp workspace holds memory allocated by custom library via OpResource
+  auto cpu_alloc = [&](int size) {
+    mshadow::Tensor<mxnet::cpu, 1, char> workspace =
+      resource.get_space_typed<mxnet::cpu, 1, char>(mshadow::Shape1(size), cpu_stream);
+    return workspace.dptr_;
+  };
+  auto gpu_alloc = [&](int size) {
+    mshadow::Tensor<mxnet::gpu, 1, char> workspace =
+      resource.get_space_typed<mxnet::gpu, 1, char>(mshadow::Shape1(size), gpu_stream);
+    return workspace.dptr_;
+  };
+
+  // create lambda without captures so that we can cast it to function pointer
+  // this needs to be a lambda function so that we can do the decltype cast
+  typedef decltype(cpu_alloc) alloc_type_cpu;
+  auto cpu_malloc = [](void* _cpu_alloc, int size) {
+    // cast the void* argument to the type for the cpu_alloc lambda function
+    alloc_type_cpu* cpualloc = static_cast<alloc_type_cpu*>(_cpu_alloc);
+    // call cpu_alloc to actually allocate memory and get the pointer
+    void* ptr = (*cpualloc)(size);
+    return ptr;
+  };
+  typedef decltype(gpu_alloc) alloc_type_gpu;
+  auto gpu_malloc = [](void* _gpu_alloc, int size) {
+    alloc_type_gpu* gpualloc = static_cast<alloc_type_gpu*>(_gpu_alloc);
+    void* ptr = (*gpualloc)(size);
+    return ptr;
+  };
+
+  // get actual cudaStream_t out of mxnet gpu stream and pass to lib_api.h
+  void *cuda_stream = nullptr;
+  if (inputs[0].ctx().dev_mask() == Context::kGPU) {
+    cuda_stream = static_cast<void*>(mshadow::Stream<gpu>::GetStream(gpu_stream));
+  }
+
+  CHECK((fcomp_fp != nullptr && state_ptr == nullptr)
+        || (fcomp_fp == nullptr && state_ptr != nullptr))
+  << "Can only register either regular op or stateful op for '" << op_name << "'";
+
+  if (fcomp_fp != nullptr) {
+    // convert attributes to vector of char*
+    std::vector<const char*> attr_keys, attr_vals;
+    for (auto kv : attrs->dict) {
+      attr_keys.push_back(kv.first.c_str());
+      attr_vals.push_back(kv.second.c_str());
+    }
+    // call fcompute function
+    CHECK(callFComp(fcomp_fp, attr_keys.data(), attr_vals.data(), attr_keys.size(),
+                    in_shapes.data(), in_dims.data(), in_data.data(), in_types.data(),
+                    in_verIDs.data(), in_dev_type.data(), in_dev_id.data(), in_data.size(),
+                    out_shapes.data(), out_dims.data(), out_data.data(), out_types.data(),
+                    out_verIDs.data(), out_dev_type.data(), out_dev_id.data(), out_data.size(),
+                    cpu_malloc, &cpu_alloc, gpu_malloc, &gpu_alloc, cuda_stream))
+    << "Error calling FCompute for custom operator '" << op_name << "'";
+  }
+
+  if (state_ptr != nullptr) {
+    // retrieve op state object created from CreateOpState
+    CustomStatefulOpWrapper& op = state_ptr->get_state<CustomStatefulOpWrapper>();
+    CustomStatefulOp* state_op_inst = op.get_instance();
+    CHECK(state_op_inst != nullptr)
+    << "Error MXNet cannot load custom stateful operator'" << op_name << "'";
 
 Review comment:
   this error message makes it seem like there was a problem with loading the library/operator, when actually its a problem with MXNet unable to retrieve the op_state that should have been created already. Maybe the message should say "Error! op_state is null for operator '" << op_name

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372603146
 
 

 ##########
 File path: tests/python/unittest/test_extensions.py
 ##########
 @@ -148,3 +156,47 @@ def test_subgraph():
     out3 = exe3.forward()
     # check that result matches one executed by MXNet
     assert_almost_equal(out[0].asnumpy(), out3[0].asnumpy(), rtol=1e-3, atol=1e-3)
+
+@unittest.skipIf(check_platform(), "not all machine types supported")
+@unittest.skipIf(is_cd_run(), "continuous delivery run - ignoring test")
 
 Review comment:
   Why skipping this test on CD run?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367703072
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -450,32 +480,59 @@ int MXLoadLib(const char *path) {
         return ptr;
       };
 
+      // pass the gpu stream associated with the context to custom library
+      void* gpu_stream = nullptr;
+      if (inputs[i].ctx().dev_mask() == Context::kGPU) {
 
 Review comment:
   same as before for fstateful

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369153621
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -164,39 +186,60 @@ int MXLoadLib(const char *path) {
   for (int i = 0; i < numOps; i++) {
     const char* name;
     // function pointers holding implementation from custom library
-    fcomp_t fcomp_fp = nullptr;
     parseAttrs_t parse_fp = nullptr;
     inferType_t type_fp = nullptr;
     inferShape_t shape_fp = nullptr;
     // optional attributes
-    fcomp_t fgrad_fp = nullptr;
     mutateInputs_t mutate_fp = nullptr;
-    createOpState_t create_opstate_fp = nullptr;
     bool isSubgraphOp = false;
     int _isSubgraphOp = 0;
-
-    // get custom operator implemenation from the dynamic library
-    opRegGet(i, &name, &fcomp_fp, &fgrad_fp, &parse_fp, &type_fp, &shape_fp,
-             &mutate_fp, &create_opstate_fp, &_isSubgraphOp);
+    // lists of forward and backward function associated with each context
+    const char **forward_ctx, **backward_ctx, **createop_ctx;
+    fcomp_t *forward_fcomp, *backward_fcomp;
+    createOpState_t *createop_fp;
+    int forward_count, backward_count, createop_count;
+
+    // main function to get custom operator implemenation from the custom library
+    opRegGet(i, &name, &_isSubgraphOp,
+             &forward_ctx, &forward_fcomp, &forward_count,
+             &backward_ctx, &backward_fcomp, &backward_count,
+             &createop_ctx, &createop_fp, &createop_count,
+             &parse_fp, &type_fp, &shape_fp, &mutate_fp);
+
+    // construct maps of context to forward/backward custom library function
+    std::unordered_map<std::string, fcomp_t> forward_ctx_map;
+    std::unordered_map<std::string, fcomp_t> backward_ctx_map;
+    std::unordered_map<std::string, createOpState_t> createop_map;
+    for (int i=0; i<forward_count; i++) {
+      std::string ctx_str(forward_ctx[i]);
+      forward_ctx_map[ctx_str] = forward_fcomp[i];
 
 Review comment:
   should we check that `ctx_str` was already in `forward_ctx_map` and error out if we find it (since that means the user registered the same context twice)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369386321
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -1056,26 +1171,30 @@ extern "C" {
   int
 #endif
   _opCallFStatefulCompute(int is_forward, void* state_op,
-                          const int64_t** inshapes, int* indims,
-                          void** indata, int* intypes, size_t* inIDs, int num_in,
-                          const int64_t** outshapes, int* outdims,
-                          void** outdata, int* outtypes, size_t* outIDs, int num_out,
-                          xpu_malloc_t cpu_malloc, void* cpu_alloc) {
+                          const int64_t** inshapes, int* indims, void** indata, int* intypes,
+                          size_t* inIDs, const char** indev_type, int* indev_id, int num_in,
+                          const int64_t** outshapes, int* outdims, void** outdata, int* outtypes,
+                          size_t* outIDs, const char** outdev_type, int* outdev_id, int num_out,
+                          xpu_malloc_t cpu_malloc, void* cpu_alloc,
+                          xpu_malloc_t gpu_malloc, void* gpu_alloc, void* stream) {
     // create a vector of tensors for inputs
     std::vector<MXTensor> inputs(num_in);
     for (int i = 0; i < num_in; i++) {
-      inputs[i].setTensor(indata[i], (MXDType)intypes[i], inshapes[i], indims[i], inIDs[i]);
+      inputs[i].setTensor(indata[i], (MXDType)intypes[i], inshapes[i], indims[i],
+                          inIDs[i], {indev_type[i], indev_id[i]});
     }
 
     // create a vector of tensors for outputs
     std::vector<MXTensor> outputs(num_out);
     for (int i = 0; i < num_out; i++) {
       outputs[i].setTensor(outdata[i], (MXDType)outtypes[i], outshapes[i], outdims[i],
-                           outIDs[i]);
+                           outIDs[i], {outdev_type[i], outdev_id[i]});
     }
-    OpResource res(cpu_malloc, cpu_alloc);
+
+    OpResource res(cpu_malloc, cpu_alloc, gpu_malloc, gpu_alloc, stream);
+
     CustomStatefulOp* op_ptr = reinterpret_cast<CustomStatefulOp*>(state_op);
-    if (is_forward) {
+    if (is_forward == 1) {
 
 Review comment:
   the equal comparison is not necessary since a you're passing either a 1 or a 0 in c_api.cc and `if(0)` evaluates to false

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] wkcn commented on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
wkcn commented on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-580570282
 
 
   Merged. : )

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369387057
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -99,7 +99,135 @@ inline int MXAPIGetFunctionRegInfo(const FunRegType *e,
 // NOTE: return value is added in API_END
 
 /*!
- * \brief Loads dynamic library and initializes it
+ * \brief Common compute function dispatcher for forward/backward and stateful forward/backward
+ * state_ptr will be nullptr for regular ops; fcomp_fp is nullptr for stateful ops
+ */
+void CustomFComputeDispatcher(const std::string op_name,
+                              const opCallFComp_t callFComp,
+                              const fcomp_t fcomp_fp,
+                              const nnvm::NodeAttrs* attrs,
+                              const opCallFStatefulComp_t callFStatefulComp,
+                              int stateful_forward_flag,
+                              const OpStatePtr* state_ptr,
+                              const OpContext& ctx,
+                              const std::vector<NDArray>& inputs,
+                              const std::vector<OpReqType>& req,
+                              const std::vector<NDArray>& outputs) {
+  std::vector<void*> in_data, out_data;
+  std::vector<const int64_t *> in_shapes, out_shapes;
+  std::vector<int> in_dims, out_dims;
+  std::vector<int> in_types, out_types;
+  std::vector<size_t> in_verIDs, out_verIDs;
+  std::vector<const char*> in_dev_type, out_dev_type;
+  std::vector<int> in_dev_id, out_dev_id;
+
+  // convert inputs/outpus NDArray to C types to be passed to lib_api.h
+  for (size_t i = 0; i < inputs.size(); i++) {
+    in_data.push_back(inputs[i].data().dptr_);
+    in_shapes.push_back(inputs[i].shape().data());
+    in_dims.push_back(inputs[i].shape().ndim());
+    in_types.push_back(inputs[i].dtype());
+    in_verIDs.push_back(inputs[i].version());
+    const char* ctx_str = inputs[i].ctx().dev_mask() == Context::kCPU ? "cpu" : "gpu";
+    in_dev_type.push_back(ctx_str);
+    in_dev_id.push_back(inputs[i].ctx().real_dev_id());
+  }
+
+  for (size_t i = 0; i < outputs.size(); i++) {
+    out_data.push_back(outputs[i].data().dptr_);
+    out_shapes.push_back(outputs[i].shape().data());
+    out_dims.push_back(outputs[i].shape().ndim());
+    out_types.push_back(outputs[i].dtype());
+    out_verIDs.push_back(outputs[i].version());
+    const char* ctx_str = outputs[i].ctx().dev_mask() == Context::kCPU ? "cpu" : "gpu";
+    out_dev_type.push_back(ctx_str);
+    out_dev_id.push_back(outputs[i].ctx().real_dev_id());
+  }
+
+  // get memory resource and mxnet backend streams
+  const Resource &resource = ctx.requested[0];
+  mshadow::Stream<mxnet::cpu> *cpu_stream = ctx.get_stream<mxnet::cpu>();
+  mshadow::Stream<mxnet::gpu> *gpu_stream = ctx.get_stream<mxnet::gpu>();
+
+  // create lambda that captures stream & resource objects
+  // this temp workspace holds memory allocated by custom library via OpResource
+  auto cpu_alloc = [&](int size) {
+    mshadow::Tensor<mxnet::cpu, 1, char> workspace =
+      resource.get_space_typed<mxnet::cpu, 1, char>(mshadow::Shape1(size), cpu_stream);
+    return workspace.dptr_;
+  };
+  auto gpu_alloc = [&](int size) {
+    mshadow::Tensor<mxnet::gpu, 1, char> workspace =
+      resource.get_space_typed<mxnet::gpu, 1, char>(mshadow::Shape1(size), gpu_stream);
+    return workspace.dptr_;
+  };
+
+  // create lambda without captures so that we can cast it to function pointer
+  // this needs to be a lambda function so that we can do the decltype cast
+  typedef decltype(cpu_alloc) alloc_type_cpu;
+  auto cpu_malloc = [](void* _cpu_alloc, int size) {
+    // cast the void* argument to the type for the cpu_alloc lambda function
+    alloc_type_cpu* cpualloc = static_cast<alloc_type_cpu*>(_cpu_alloc);
+    // call cpu_alloc to actually allocate memory and get the pointer
+    void* ptr = (*cpualloc)(size);
+    return ptr;
+  };
+  typedef decltype(gpu_alloc) alloc_type_gpu;
+  auto gpu_malloc = [](void* _gpu_alloc, int size) {
+    alloc_type_gpu* gpualloc = static_cast<alloc_type_gpu*>(_gpu_alloc);
+    void* ptr = (*gpualloc)(size);
+    return ptr;
+  };
+
+  // get actual cudaStream_t out of mxnet gpu stream and pass to lib_api.h
+  void *cuda_stream = nullptr;
+  if (inputs[0].ctx().dev_mask() == Context::kGPU) {
+    cuda_stream = static_cast<void*>(mshadow::Stream<gpu>::GetStream(gpu_stream));
+  }
+
+  CHECK((fcomp_fp != nullptr && state_ptr == nullptr)
+        || (fcomp_fp == nullptr && state_ptr != nullptr))
+  << "Can only register either regular op or stateful op for '" << op_name << "'";
+
+  if (fcomp_fp != nullptr) {
+    // convert attributes to vector of char*
+    std::vector<const char*> attr_keys, attr_vals;
+    for (auto kv : attrs->dict) {
+      attr_keys.push_back(kv.first.c_str());
+      attr_vals.push_back(kv.second.c_str());
+    }
+    // call fcompute function
+    CHECK(callFComp(fcomp_fp, attr_keys.data(), attr_vals.data(), attr_keys.size(),
+                    in_shapes.data(), in_dims.data(), in_data.data(), in_types.data(),
+                    in_verIDs.data(), in_dev_type.data(), in_dev_id.data(), in_data.size(),
+                    out_shapes.data(), out_dims.data(), out_data.data(), out_types.data(),
+                    out_verIDs.data(), out_dev_type.data(), out_dev_id.data(), out_data.size(),
+                    cpu_malloc, &cpu_alloc, gpu_malloc, &gpu_alloc, cuda_stream))
+    << "Error calling FCompute for custom operator '" << op_name << "'";
 
 Review comment:
   should we make this unique for the stateful operator? like: "Error calling FCompute for custom stateful operator '" << op_name << "'"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky edited a comment on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky edited a comment on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-578355722
 
 
   @rondogency can you try this:
   ```
   template <class T>
   class Registry {
    public:
     /*!
      * \brief get singleton pointer to class
      * \returns pointer to class
      */
     static Registry* get() {
       static Registry inst 
   #if !defined(_WIN32) && !defined(_WIN64) && !defined(__WINDOWS__)
   __attribute__ ((visibility ("hidden")));
   #else
   ;
   #endif
       return &inst;
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] wkcn merged pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
wkcn merged pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369384055
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -328,16 +358,31 @@ typedef void* (*xpu_malloc_t)(void*, int);
  */
 class OpResource {
  public:
-  OpResource(xpu_malloc_t cm, void* ca) : cpu_malloc(cm), cpu_alloc(ca) {}
+  OpResource(xpu_malloc_t cm, void* ca, xpu_malloc_t gm, void* ga, void* st)
+    : cpu_malloc(cm), gpu_malloc(gm), cpu_alloc(ca), gpu_alloc(ga), cuda_stream(st) {}
 
   /*! \brief allocate memory controlled by MXNet */
-  void* alloc(int size) {
+  void* alloc_cpu(int size) {
     return cpu_malloc(cpu_alloc, size);
   }
 
+  /*! \brief allocate memory controlled by MXNet */
 
 Review comment:
   can we change the comment to "allocate gpu memory"?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367731063
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -763,9 +818,9 @@ typedef int (*opCallInferType_t)(inferType_t, const char* const*, const char* co
 
 #define MXLIB_OPCALLFCOMP_STR "_opCallFCompute"
 typedef int (*opCallFComp_t)(fcomp_t, const char* const*, const char* const*, int,
-                             const int64_t**, int*, void**, int*, size_t*, int,
-                             const int64_t**, int*, void**, int*, size_t*, int,
-                             xpu_malloc_t, void*);
+                             const int64_t**, int*, void**, int*, size_t*, char**, int*, int,
+                             const int64_t**, int*, void**, int*, size_t*, char**, int*, int,
+                             xpu_malloc_t, void*, void*);
 
 Review comment:
   agree

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-579340547
 
 
   @rondogency looks like the windows build/test is working now with those cmake changes:
   ```
   test_operator_gpu.test_custom_op_gpu ... 
   MXNet version 10600 supported
   [
   10:16:08] C:\jenkins_slave\workspace\build-gpu\src\c_api\c_api.cc:286: 
   Found 2 operators in library
   [
   10:16:08] C:\jenkins_slave\workspace\build-gpu\src\c_api\c_api.cc:350: 	Op[0] my_relu
   [
   10:16:08] C:\jenkins_slave\workspace\build-gpu\src\c_api\c_api.cc:350: 	Op[1] my_state_relu
   [
   10:16:08] C:\jenkins_slave\workspace\build-gpu\src\c_api\c_api.cc:785: Found 0 partitioners in library
   ok (0.6834s)
   ```
   
   Now we just need to work through the flaky tests

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367713125
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -833,20 +891,26 @@ extern "C" {
 #else
   void
 #endif
-  _opRegGet(int idx, const char** name, fcomp_t* fcomp, fcomp_t* fgrad,
+  _opRegGet(int idx, const char** name,
+            const char*** forward_ctx, fcomp_t** forward_fp, int* forward_count,
+            const char*** backward_ctx, fcomp_t** backward_fp, int* backward_count,
             parseAttrs_t* parse, inferType_t* type,
             inferShape_t* shape, mutateInputs_t* mutate,
             createOpState_t* create_op, int *isSGop) {
-    CustomOp op = Registry<CustomOp>::get()->get(idx);
-    *name = op.name;
-    *fcomp = op.forward;
-    *fgrad = op.backward;
-    *parse = op.parse_attrs;
-    *type = op.infer_type;
-    *shape = op.infer_shape;
-    *mutate = op.mutate_inputs;
-    *create_op = op.create_opstate;
-    *isSGop = op.isSGop;
+    CustomOp *op = &(Registry<CustomOp>::get()->get(idx));
 
 Review comment:
   maybe try:
   ```
       CustomOp& op = Registry<CustomOp>::get()->get(idx);
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367696078
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -215,25 +225,43 @@ struct MXTensor {
   MXTensor() : data_ptr(NULL), dtype(kUNSET), verID(0) {}
 
   MXTensor(void *data_ptr, const std::vector<int64_t> &shape, MXDType dtype,
-           size_t vID)
-  : data_ptr(data_ptr), shape(shape), dtype(dtype), verID(vID) {}
+           size_t vID, MXContext mx_ctx)
+  : data_ptr(data_ptr), shape(shape), dtype(dtype), verID(vID), ctx(mx_ctx) {}
 
   /*! \brief populate internal tensor fields */
-  void setTensor(void *dptr, MXDType type, const int64_t* dims,
-                 int ndims, size_t vID) {
-    data_ptr = dptr; dtype = type; verID = vID;
+  void setTensor(void *dptr, MXDType type, const int64_t* dims, int ndims,
+                 size_t vID, MXContext mx_ctx) {
+    data_ptr = dptr; dtype = type; verID = vID; ctx = mx_ctx;
     shape.clear();
     for (int j = 0; j < ndims; j++) {
       shape.push_back(dims[j]);
     }
-    setDLTensor();
+    DLDeviceType dltype;
+    if (ctx.dev_type == "cpu")
+      dltype = kDLCPU;
+    else if (ctx.dev_type == "gpu")
+      dltype = kDLGPU;
+    else if (ctx.dev_type == "opencl")
+      dltype = kDLOpenCL;
+    else if (ctx.dev_type == "vulcan")
+      dltype = kDLVulkan;
+    else if (ctx.dev_type == "metal")
+      dltype = kDLMetal;
+    else if (ctx.dev_type == "vpi")
+      dltype = kDLVPI;
+    else if (ctx.dev_type == "rocm")
+      dltype = kDLROCM;
+    else
+      dltype = kDLExtDev;
 
 Review comment:
   should this be else or "ext"?
   
   should we move this into the setDLTensor function?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r373270975
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -594,26 +657,58 @@ class CustomOp {
     mutate_inputs = func;
     return *this;
   }
-  CustomOp& setCreateOpState(createOpState_t func) {
-    create_opstate = func;
+  CustomOp& setCreateOpState(createOpState_t func, const char* ctx) {
+    if (create_op_ctx_map.count(ctx) > 0)
+      raiseDuplicateContextError();
+    create_op_ctx_map[ctx] = func;
     return *this;
   }
   CustomOp& setIsSubgraphOp() {
     isSGop = true;
     return *this;
   }
+  void mapToVector() {
+    for (auto kv : forward_ctx_map) {
+      forward_ctx_cstr.push_back(kv.first);
+      forward_fp.push_back(kv.second);
+    }
+    for (auto kv : backward_ctx_map) {
+      backward_ctx_cstr.push_back(kv.first);
+      backward_fp.push_back(kv.second);
+    }
+    for (auto kv : create_op_ctx_map) {
+      create_op_ctx_cstr.push_back(kv.first);
+      create_op_fp.push_back(kv.second);
+    }
+  }
+  ~CustomOp() {}
 
   /*! \brief operator name */
   const char* name;
+
   /*! \brief operator functions */
-  fcomp_t forward;
-  fcomp_t backward;
   parseAttrs_t parse_attrs;
   inferType_t infer_type;
   inferShape_t infer_shape;
   mutateInputs_t mutate_inputs;
-  createOpState_t create_opstate;
   bool isSGop;
+
+  /*! \brief vector repr of ctx map to be easily loaded from c_api */
+  std::vector<const char*> forward_ctx_cstr, backward_ctx_cstr, create_op_ctx_cstr;
+  std::vector<fcomp_t> forward_fp, backward_fp;
+  std::vector<createOpState_t> create_op_fp;
+
+ private:
+  void raiseDuplicateContextError() {
+    std::string op_name_str(name);
+    throw std::runtime_error(
+      "Error! Error! Cannot register multiple functions under same context for operator '"
+      + op_name_str + "'");
+  }
+
+  /*! \brief dedup context maps - static string ctx to custom function */
+  std::unordered_map<const char*, fcomp_t> forward_ctx_map, backward_ctx_map;
 
 Review comment:
   The key of `forward_ctx_map`, `backward_ctx_map` and `create_op_ctx_map` are the pointer value of a string, rather than the content. Although the pointers of the same content are the same in a library, the pointers are different in different libraries. Using `std::string` as the key may be better.
   
   Others look good to me : )

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
wkcn commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372696099
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -203,6 +214,16 @@ enum MXDType {
   kUNSET = 100,
 };
 
+/*!
+ * \brief Context info passing from MXNet OpContext
+ * dev_type is string repr of supported context, currently only "cpu" and "gpu"
+ * dev_id is the device index where the tensor locates
+ */
+typedef struct {
+  std::string dev_type;
 
 Review comment:
   I worry about the ABI compatibility of std::string, since MXNet and custom lib may be built in different version of compilers or different compilers, e.g. gcc5 and gcc 9, gcc and clang. The implementation of std::string may be different. A solution is to use a C string char*.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
rondogency commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r374977594
 
 

 ##########
 File path: example/extensions/lib_custom_op/test_relu.py
 ##########
 @@ -0,0 +1,83 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# coding: utf-8
+# pylint: disable=arguments-differ
+
+# This test checks dynamic loading of custom library into MXNet
+# and checks end to end compute of a simple 2D gemm custom op
+
+import mxnet as mx
+import os
+import time
+
+#load library
+if (os.name=='posix'):
+    path = os.path.abspath('librelu_lib.so')
+    mx.library.load(path)
+
+a = mx.nd.array([[-2,-1],[1,2]], ctx=mx.cpu())
+b = mx.nd.array([[-2,-1],[1,2]], ctx=mx.gpu())
+
+print("--------start ndarray compute---------")
+print(mx.nd.my_relu(a))
+print(mx.nd.my_relu(b))
+print(mx.nd.my_state_relu(a))
+print(mx.nd.my_state_relu(b))
+
+print("--------start symbolic compute--------")
+c = mx.sym.Variable('c')
+d = mx.sym.Variable('d')
+e = mx.sym.my_relu(c)
+base = mx.sym.relu(d)
+in_grad = [mx.nd.empty((2,2), ctx=mx.gpu())]
+in_grad_base = [mx.nd.empty((2,2), ctx=mx.gpu())]
+exe = e.bind(ctx=mx.gpu(), args={'c':b}, args_grad=in_grad)
+exe_base = base.bind(ctx=mx.gpu(), args={'d':b}, args_grad=in_grad_base)
+out = exe.forward()
+out_base = exe_base.forward()
+print(out)
+print(out_base)
+
+print("--------start backward compute--------")
+out_grad = mx.nd.ones((2,2), ctx=mx.gpu())
+exe.backward([out_grad])
+exe_base.backward([out_grad])
+print(in_grad)
+print(in_grad_base)
+
+print("--------start testing larger ndarray---------")
+a = mx.nd.uniform(shape=(100,100,100), ctx=mx.cpu())
+b = mx.nd.uniform(shape=(100,100,100), ctx=mx.gpu())
 
 Review comment:
   The fix is in #17516 along with one more fix

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r370909037
 
 

 ##########
 File path: example/extensions/lib_custom_op/gemm_lib.cc
 ##########
 @@ -203,9 +210,9 @@ class MyStatefulGemm : public CustomStatefulOp {
 
 MXReturnValue createOpState(std::map<std::string, std::string> attrs,
                             CustomStatefulOp** op_inst) {
-  int count = 0;
-  if (attrs.count("test_kw") > 0)
-    count = std::stoi(attrs["test_kw"]);
+  // testing passing of keyward arguments
 
 Review comment:
   Nit: keyword

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r372637141
 
 

 ##########
 File path: tests/python/unittest/test_extensions.py
 ##########
 @@ -148,3 +156,47 @@ def test_subgraph():
     out3 = exe3.forward()
     # check that result matches one executed by MXNet
     assert_almost_equal(out[0].asnumpy(), out3[0].asnumpy(), rtol=1e-3, atol=1e-3)
+
+@unittest.skipIf(check_platform(), "not all machine types supported")
+@unittest.skipIf(is_cd_run(), "continuous delivery run - ignoring test")
 
 Review comment:
   Looks like this was added in this PR #16127 and @rondogency just copied the same from teh other test in teh test_extensions.py file. We're just following the existing standard. If @perdasilva and @szha want to enable it for the CD run they can do that in a future PR

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369154109
 
 

 ##########
 File path: src/c_api/c_api.cc
 ##########
 @@ -408,24 +452,13 @@ int MXLoadLib(const char *path) {
       std::vector<int> in_dims, out_dims;
       std::vector<int> in_types, out_types;
       std::vector<size_t> in_verIDs, out_verIDs;
+      std::vector<const char*> in_dev_type, out_dev_type;
+      std::vector<int> in_dev_id, out_dev_id;
 
-      // convert input tensors to constituent parts
-      for (size_t i = 0; i < inputs.size(); i++) {
-        in_data.push_back(inputs[i].data().dptr_);
-        in_shapes.push_back(inputs[i].shape().data());
-        in_dims.push_back(inputs[i].shape().ndim());
-        in_types.push_back(inputs[i].dtype());
-        in_verIDs.push_back(inputs[i].version());
-      }
-
-      // convert output tensors to constituent parts
-      for (size_t i = 0; i < outputs.size(); i++) {
-        out_data.push_back(outputs[i].data().dptr_);
-        out_shapes.push_back(outputs[i].shape().data());
-        out_dims.push_back(outputs[i].shape().ndim());
-        out_types.push_back(outputs[i].dtype());
-        out_verIDs.push_back(outputs[i].version());
-      }
+      NDArrayToCTypes(inputs, in_data, in_shapes, in_dims, in_types,
+                      in_verIDs, in_dev_type, in_dev_id);
+      NDArrayToCTypes(outputs, out_data, out_shapes, out_dims, out_types,
+                      out_verIDs, out_dev_type, out_dev_id);
 
 Review comment:
   this is a great refactor! saves space and adds clarity!!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369385193
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -594,26 +640,58 @@ class CustomOp {
     mutate_inputs = func;
     return *this;
   }
-  CustomOp& setCreateOpState(createOpState_t func) {
-    create_opstate = func;
+  CustomOp& setCreateOpState(createOpState_t func, const char* ctx) {
+    if (create_op_ctx_map.count(ctx) > 0)
+      raiseDuplicateContextError();
+    create_op_ctx_map[ctx] = func;
     return *this;
   }
   CustomOp& setIsSubgraphOp() {
     isSGop = true;
     return *this;
   }
+  void mapToVector() {
+    for (auto kv : forward_ctx_map) {
+      forward_ctx_cstr.push_back(kv.first);
+      forward_fp.push_back(kv.second);
+    }
+    for (auto kv : backward_ctx_map) {
+      backward_ctx_cstr.push_back(kv.first);
+      backward_fp.push_back(kv.second);
+    }
+    for (auto kv : create_op_ctx_map) {
+      create_op_ctx_cstr.push_back(kv.first);
+      create_op_fp.push_back(kv.second);
+    }
+  }
+  ~CustomOp() {}
 
   /*! \brief operator name */
   const char* name;
+
   /*! \brief operator functions */
-  fcomp_t forward;
-  fcomp_t backward;
   parseAttrs_t parse_attrs;
   inferType_t infer_type;
   inferShape_t infer_shape;
   mutateInputs_t mutate_inputs;
-  createOpState_t create_opstate;
   bool isSGop;
+
+  /*! \brief vector repr of ctx map to be easily loaded from c_api */
+  std::vector<const char*> forward_ctx_cstr, backward_ctx_cstr, create_op_ctx_cstr;
+  std::vector<fcomp_t> forward_fp, backward_fp;
+  std::vector<createOpState_t> create_op_fp;
+
+ private:
+  void raiseDuplicateContextError() {
+    std::string op_name_str(name);
+    throw std::runtime_error(
+      "Error! Register multiple functions under same context for operator '"
 
 Review comment:
   how about "Error! Cannot register multiple functions under same context for operator '" ...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367713381
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -833,20 +891,26 @@ extern "C" {
 #else
   void
 #endif
-  _opRegGet(int idx, const char** name, fcomp_t* fcomp, fcomp_t* fgrad,
+  _opRegGet(int idx, const char** name,
+            const char*** forward_ctx, fcomp_t** forward_fp, int* forward_count,
+            const char*** backward_ctx, fcomp_t** backward_fp, int* backward_count,
             parseAttrs_t* parse, inferType_t* type,
             inferShape_t* shape, mutateInputs_t* mutate,
             createOpState_t* create_op, int *isSGop) {
-    CustomOp op = Registry<CustomOp>::get()->get(idx);
-    *name = op.name;
-    *fcomp = op.forward;
-    *fgrad = op.backward;
-    *parse = op.parse_attrs;
-    *type = op.infer_type;
-    *shape = op.infer_shape;
-    *mutate = op.mutate_inputs;
-    *create_op = op.create_opstate;
-    *isSGop = op.isSGop;
+    CustomOp *op = &(Registry<CustomOp>::get()->get(idx));
 
 Review comment:
   or change 
   ```
    T* get(int idx) {
       return entries[idx];
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-577787320
 
 
   @mxnet-label-bot add [pr-awaiting-review]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-578608204
 
 
   @rondogency if you're going to add a new library for the GPU tests you need to modify: CMakeLists.txt, Jenkins_steps.groovy. In the CI cmake is used, and you need to store the library between build/test stages of the CI by modifying the groovy file. 
   
   See the subgraph property PR modifications for these files that add `libsubgraph_lib.so`:
   https://github.com/apache/incubator-mxnet/pull/17034/files
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r369383858
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -328,16 +358,31 @@ typedef void* (*xpu_malloc_t)(void*, int);
  */
 class OpResource {
  public:
-  OpResource(xpu_malloc_t cm, void* ca) : cpu_malloc(cm), cpu_alloc(ca) {}
+  OpResource(xpu_malloc_t cm, void* ca, xpu_malloc_t gm, void* ga, void* st)
 
 Review comment:
   I know you're just following the existing precedent, but can we change the names `cm` to cpu_mem_alloc so its more readable?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky edited a comment on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky edited a comment on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-578355722
 
 
   @rondogency can you try this:
   ```
   #if defined(_WIN32) || defined(_WIN64) || defined(__WINDOWS__)
     #define VISIBILITY  
   #else
     #define VISIBILITY  __attribute__ ((visibility ("hidden")))
   #endif
   
   template <class T>
   class Registry {
    public:
     /*!
      * \brief get singleton pointer to class
      * \returns pointer to class
      */
     static Registry* get() VISIBILITY {
       static Registry inst;
       return &inst;
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r371613731
 
 

 ##########
 File path: CMakeLists.txt
 ##########
 @@ -752,6 +741,31 @@ elseif(MSVC)
 
 endif()
 
+add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc)
+add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc)
+target_include_directories(customop_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
+target_include_directories(subgraph_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
+if (USE_CUDA)
+  add_library(customop_gpu_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/relu_lib.cu)
+  target_include_directories(customop_gpu_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include/mxnet)
+endif()
+if(UNIX)
+  target_compile_options(customop_lib PUBLIC -shared)
+  target_compile_options(subgraph_lib PUBLIC -shared)
+  if (USE_CUDA)
+    target_compile_options(customop_gpu_lib PUBLIC -shared)
+  endif()
+elseif(MSVC)
+  target_compile_options(customop_lib PUBLIC /LD)
+  target_compile_options(subgraph_lib PUBLIC /LD)
+  set_target_properties(customop_lib PROPERTIES PREFIX "lib")
+  set_target_properties(subgraph_lib PROPERTIES PREFIX "lib")
+  if (USE_CUDA)
+    target_compile_options(customop_gpu_lib PUBLIC /LD)
 
 Review comment:
   also try this so we can debug the cmake generated makefile:
   ```
     set(CMAKE_VERBOSE_MAKEFILE ON)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky commented on a change in pull request #17270: [WIP] Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r367697443
 
 

 ##########
 File path: include/mxnet/lib_api.h
 ##########
 @@ -566,16 +609,20 @@ typedef MXReturnValue (*createOpState_t)(std::map<std::string, std::string>,
 class CustomOp {
  public:
   explicit CustomOp(const char* op_name) : name(op_name),
-    forward(NULL), backward(NULL), parse_attrs(NULL), infer_type(NULL),
-    infer_shape(NULL), mutate_inputs(NULL), create_opstate(NULL),
-    isSGop(false) {}
-  ~CustomOp() {}
-  CustomOp& setForward(fcomp_t fcomp) {
-    forward = fcomp;
+    parse_attrs(NULL), infer_type(NULL), infer_shape(NULL), mutate_inputs(NULL),
+    create_opstate(NULL), isSGop(false) {}
+  CustomOp& setForward(fcomp_t fcomp, std::string ctx) {
+    char* cstr = new char[ctx.length()+1];
+    strncpy(cstr, ctx.c_str(), ctx.length()+1);
+    forward_ctx_cstr.push_back(cstr);
 
 Review comment:
   i think we should use const char* here like we do for the op_name

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] samskalicky edited a comment on issue #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
samskalicky edited a comment on issue #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#issuecomment-578355722
 
 
   @rondogency can you try this:
   ```
   #if defined(_WIN32) && defined(_WIN64) && defined(__WINDOWS__)
     #define VISIBILITY  
   #else
     #define VISIBILITY  __attribute__ ((visibility ("hidden")))
   #endif
   
   template <class T>
   class Registry {
    public:
     /*!
      * \brief get singleton pointer to class
      * \returns pointer to class
      */
     static Registry* get() {
       static Registry inst VISIBILITY;
       return &inst;
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support

Posted by GitBox <gi...@apache.org>.
ptrendx commented on a change in pull request #17270: Dynamic custom operator GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17270#discussion_r370909037
 
 

 ##########
 File path: example/extensions/lib_custom_op/gemm_lib.cc
 ##########
 @@ -203,9 +210,9 @@ class MyStatefulGemm : public CustomStatefulOp {
 
 MXReturnValue createOpState(std::map<std::string, std::string> attrs,
                             CustomStatefulOp** op_inst) {
-  int count = 0;
-  if (attrs.count("test_kw") > 0)
-    count = std::stoi(attrs["test_kw"]);
+  // testing passing of keyward arguments
 
 Review comment:
   Not: keyword

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services