You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by ni...@apache.org on 2017/09/07 19:50:02 UTC
[5/5] systemml git commit: [SYSTEMML-540] Support sparse GPU conv2d
as well as fix memory estimation of convolution operations
[SYSTEMML-540] Support sparse GPU conv2d as well as fix memory estimation of convolution operations
Design doc: Memory estimation of GPU operators
- Since not all operator are supported on GPU, isGPUEnabled indicates
whether an operation is enabled for GPU. This method doesnot take into
account any memory estimates.
- To simplify memory estimation logic, the methods computeOutputMemEstimate
and computeIntermediateMemEstimate should return maximum of memory
required for GPU and CP operators.
- Additionally, these methods are guarded so that when -gpu flag is not
provided, additional memory overhead due to GPU are ignored. For example:
sparse-to-dense conversion on GPU.
- (WIP) Every GPU operators should respect the memory returned by
computeIntermediateMemEstimate (and computeOutputMemEstimate - see below
point).
- (WIP) Every GPU operator should create output in the same format as the
corresponding CP operator. That is, computeOutputMemEstimate are
consistent across both CP and GPU in terms of worst-case.
- The drawback of using maximum memory (mem = Math.max(mem_gpu, mem_gpu))
are:
a. GPU operator is not selected when mem_gpu < total memory available on GPU
< mem
b. CP operator is not selected (i.e. distributed operator compiled) when
mem_cpu < driver memory budget < mem
Closes #650.
Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/772d9302
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/772d9302
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/772d9302
Branch: refs/heads/master
Commit: 772d9302dc196b047134ea491542d55113f52a08
Parents: a0cf8e3
Author: Niketan Pansare <np...@us.ibm.com>
Authored: Thu Sep 7 11:49:52 2017 -0800
Committer: Niketan Pansare <np...@us.ibm.com>
Committed: Thu Sep 7 12:49:52 2017 -0700
----------------------------------------------------------------------
src/main/cpp/kernels/SystemML.cu | 19 +
src/main/cpp/kernels/SystemML.ptx | 2757 +++++++++---------
.../org/apache/sysml/hops/ConvolutionOp.java | 458 ++-
src/main/java/org/apache/sysml/hops/Hop.java | 63 +-
.../apache/sysml/lops/ConvolutionTransform.java | 11 +-
.../cp/ConvolutionCPInstruction.java | 91 +-
.../gpu/ConvolutionGPUInstruction.java | 72 +-
.../gpu/MatrixBuiltinGPUInstruction.java | 3 +-
.../instructions/gpu/context/GPUContext.java | 6 +
.../matrix/data/ConvolutionParameters.java | 25 +
.../runtime/matrix/data/LibMatrixCUDA.java | 1041 +------
.../runtime/matrix/data/LibMatrixCuDNN.java | 1219 ++++++++
12 files changed, 3229 insertions(+), 2536 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/systemml/blob/772d9302/src/main/cpp/kernels/SystemML.cu
----------------------------------------------------------------------
diff --git a/src/main/cpp/kernels/SystemML.cu b/src/main/cpp/kernels/SystemML.cu
index d64d8aa..bb6482d 100644
--- a/src/main/cpp/kernels/SystemML.cu
+++ b/src/main/cpp/kernels/SystemML.cu
@@ -156,6 +156,25 @@ __global__ void relu_backward(double* X, double* dout, double* ret, int rlen, i
}
}
+/**
+ * Performs inplace addition: ret += input
+ *
+ * @param input rhs input array allocated on the GPU
+ * @param ret the input and output array allocated on the GPU
+ * @param rlen the number of rows
+ * @param clen the number of columns
+ */
+extern "C"
+__global__ void inplace_add(double* input, double* ret, int rlen, int clen) {
+ int tid = blockIdx.x * blockDim.x + threadIdx.x;
+ int ix = tid / clen;
+ int iy = tid % clen;
+ if(ix < rlen && iy < clen) {
+ int index = ix * clen + iy;
+ ret[index] += input[index];
+ }
+}
+
// Performs the operation corresponding to the DML script:
// ones = matrix(1, rows=1, cols=Hout*Wout)
// output = input + matrix(bias %*% ones, rows=1, cols=F*Hout*Wout)