You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by GitBox <gi...@apache.org> on 2019/11/25 07:32:16 UTC

[GitHub] [singa] chrishkchris opened a new pull request #563: SINGA-490 Improve Efficiency of ReLU, Add, Sub, EltwiseMult

chrishkchris opened a new pull request #563: SINGA-490 Improve Efficiency of ReLU, Add, Sub, EltwiseMult
URL: https://github.com/apache/singa/pull/563
 
 
   This PR deals with two issues: 1. Improve ReLU, 2. Optimize Add, Sub, EltwiseMult after the last update with broadcasting:
   
   1. The ReLU function exists in every layers of CNN structures, so we can speed up CNN by improving the computational efficiency of ReLU
   
   The existing ReLU function function uses a FP32 multiplication to work with a floating point binary mask:
   
   def backward(self, dy):
      dx = singa.GTFloat(self.input, 0.0)
      dx *= dy
      return dx
   
   The new implementation is much more simple and efficient:
   
   If the ReLU input in forward function > 0,  it copys the output gradient vlaue; If not, we assign 0 value:
   
   __global__ void KernelReLUBackward(const size_t n, const float *in1, const float *in2,
                            float *out) {
     for (size_t i = blockIdx.x * blockDim.x + threadIdx.x; i < n;
          i += blockDim.x * gridDim.x) {
       out[i] = in2[i] > 0 ? in1[i] : 0.0f;
     }
   }
   
   Since it is using direct assignment of certain value instead of using FP32 multication operation, it is much more efficient.
   
   In additional, I have added the foward and backward test case in test_operation.py, covering both cpu and gpu. It passes the test case.
   
   2. After the implementation of broadcasting in Add, Sub, EltwiseMul, the Resnet throughput performance changes. So I have optimized them for maximum throughput.
   
   
   Test and Results:
   
   ```
   ubuntu@ip-172-31-35-106:~/singa/build/bin$ ./test_singa
   Running main() from gtest_main.cc
   [==========] Running 238 tests from 48 test cases.
   [----------] Global test environment set-up.
   [----------] 1 test from Accuracy
   [ RUN      ] Accuracy.Compute
   [       OK ] Accuracy.Compute (0 ms)
   [----------] 1 test from Accuracy (0 ms total)
   
   [----------] 3 tests from Activation
   [ RUN      ] Activation.Setup
   [       OK ] Activation.Setup (0 ms)
   [ RUN      ] Activation.Forward
   [       OK ] Activation.Forward (0 ms)
   [ RUN      ] Activation.Backward
   [       OK ] Activation.Backward (0 ms)
   [----------] 3 tests from Activation (0 ms total)
   
   [----------] 2 tests from AdaGrad
   [ RUN      ] AdaGrad.ApplyCPU
   [       OK ] AdaGrad.ApplyCPU (1 ms)
   [ RUN      ] AdaGrad.ApplyCUDA
   [       OK ] AdaGrad.ApplyCUDA (848 ms)
   [----------] 2 tests from AdaGrad (849 ms total)
   
   [----------] 3 tests from BatchNorm
   [ RUN      ] BatchNorm.Setup
   [       OK ] BatchNorm.Setup (0 ms)
   [ RUN      ] BatchNorm.Forward
   [       OK ] BatchNorm.Forward (0 ms)
   [ RUN      ] BatchNorm.Backward
   [       OK ] BatchNorm.Backward (0 ms)
   [----------] 3 tests from BatchNorm (0 ms total)
   
   [----------] 2 tests from BinFileWriter
   [ RUN      ] BinFileWriter.Create
   [       OK ] BinFileWriter.Create (1 ms)
   [ RUN      ] BinFileWriter.Append
   [       OK ] BinFileWriter.Append (0 ms)
   [----------] 2 tests from BinFileWriter (1 ms total)
   
   [----------] 2 tests from BinFileReader
   [ RUN      ] BinFileReader.Read
   [       OK ] BinFileReader.Read (0 ms)
   [ RUN      ] BinFileReader.SeekToFirst
   [       OK ] BinFileReader.SeekToFirst (0 ms)
   [----------] 2 tests from BinFileReader (0 ms total)
   
   [----------] 3 tests from Channel
   [ RUN      ] Channel.InitChannel
   [       OK ] Channel.InitChannel (0 ms)
   [ RUN      ] Channel.SendStringToFile
   [W d1125 t06:24:49 p06772:-560 /home/ubuntu/singa/src/utils/channel.cc:70] Messages will be appended to an existed file: /tmp/test_channel
   [       OK ] Channel.SendStringToFile (0 ms)
   [ RUN      ] Channel.SendStringToFileAndStderr
   test to both file and stderr
   [       OK ] Channel.SendStringToFileAndStderr (0 ms)
   [----------] 3 tests from Channel (0 ms total)
   
   [----------] 9 tests from Concat
   [ RUN      ] Concat.Setup
   [       OK ] Concat.Setup (0 ms)
   [ RUN      ] Concat.ForwardConcatRowCpp
   [       OK ] Concat.ForwardConcatRowCpp (0 ms)
   [ RUN      ] Concat.ForwardConcatColumnCpp
   [       OK ] Concat.ForwardConcatColumnCpp (0 ms)
   [ RUN      ] Concat.ForwardConcatRowCuda
   [       OK ] Concat.ForwardConcatRowCuda (36 ms)
   [ RUN      ] Concat.ForwardConcatColumnCuda
   [       OK ] Concat.ForwardConcatColumnCuda (36 ms)
   [ RUN      ] Concat.BackwardConcatRowCpp
   [       OK ] Concat.BackwardConcatRowCpp (0 ms)
   [ RUN      ] Concat.BackwardConcatColumn
   [       OK ] Concat.BackwardConcatColumn (0 ms)
   [ RUN      ] Concat.BackwardConcatRowCuda
   [       OK ] Concat.BackwardConcatRowCuda (53 ms)
   [ RUN      ] Concat.BackwardConcatColumnCuda
   [       OK ] Concat.BackwardConcatColumnCuda (48 ms)
   [----------] 9 tests from Concat (173 ms total)
   
   [----------] 3 tests from Convolution
   [ RUN      ] Convolution.Setup
   [       OK ] Convolution.Setup (0 ms)
   [ RUN      ] Convolution.Forward
   [       OK ] Convolution.Forward (0 ms)
   [ RUN      ] Convolution.Backward
   [       OK ] Convolution.Backward (0 ms)
   [----------] 3 tests from Convolution (0 ms total)
   
   [----------] 4 tests from CppCPU
   [ RUN      ] CppCPU.Constructor
   [       OK ] CppCPU.Constructor (0 ms)
   [ RUN      ] CppCPU.MemoryMallocFree
   [       OK ] CppCPU.MemoryMallocFree (0 ms)
   [ RUN      ] CppCPU.Exec
   [       OK ] CppCPU.Exec (0 ms)
   [ RUN      ] CppCPU.CopyData
   [       OK ] CppCPU.CopyData (0 ms)
   [----------] 4 tests from CppCPU (0 ms total)
   
   [----------] 8 tests from TestSoftmaxCrossEntropy
   [ RUN      ] TestSoftmaxCrossEntropy.CppForward
   [       OK ] TestSoftmaxCrossEntropy.CppForward (0 ms)
   [ RUN      ] TestSoftmaxCrossEntropy.CppForwardAryTarget
   [       OK ] TestSoftmaxCrossEntropy.CppForwardAryTarget (0 ms)
   [ RUN      ] TestSoftmaxCrossEntropy.CppBackward
   [       OK ] TestSoftmaxCrossEntropy.CppBackward (0 ms)
   [ RUN      ] TestSoftmaxCrossEntropy.CppBackwardAryTarget
   [       OK ] TestSoftmaxCrossEntropy.CppBackwardAryTarget (0 ms)
   [ RUN      ] TestSoftmaxCrossEntropy.CudaForward
   [       OK ] TestSoftmaxCrossEntropy.CudaForward (33 ms)
   [ RUN      ] TestSoftmaxCrossEntropy.CudaForwardAryTarget
   [       OK ] TestSoftmaxCrossEntropy.CudaForwardAryTarget (33 ms)
   [ RUN      ] TestSoftmaxCrossEntropy.CudaBackward
   [       OK ] TestSoftmaxCrossEntropy.CudaBackward (33 ms)
   [ RUN      ] TestSoftmaxCrossEntropy.CudaBackwardAryTarget
   [       OK ] TestSoftmaxCrossEntropy.CudaBackwardAryTarget (32 ms)
   [----------] 8 tests from TestSoftmaxCrossEntropy (131 ms total)
   
   [----------] 1 test from CSV
   [ RUN      ] CSV.EncoderDecode
   [       OK ] CSV.EncoderDecode (0 ms)
   [----------] 1 test from CSV (0 ms total)
   
   [----------] 3 tests from CudnnActivation
   [ RUN      ] CudnnActivation.Setup
   [       OK ] CudnnActivation.Setup (0 ms)
   [ RUN      ] CudnnActivation.Forward
   [       OK ] CudnnActivation.Forward (30 ms)
   [ RUN      ] CudnnActivation.Backward
   [       OK ] CudnnActivation.Backward (30 ms)
   [----------] 3 tests from CudnnActivation (60 ms total)
   
   [----------] 3 tests from CudnnBatchNorm
   [ RUN      ] CudnnBatchNorm.Setup
   [       OK ] CudnnBatchNorm.Setup (0 ms)
   [ RUN      ] CudnnBatchNorm.Forward
   [       OK ] CudnnBatchNorm.Forward (30 ms)
   [ RUN      ] CudnnBatchNorm.Backward
   [       OK ] CudnnBatchNorm.Backward (30 ms)
   [----------] 3 tests from CudnnBatchNorm (60 ms total)
   
   [----------] 3 tests from CudnnConvolution
   [ RUN      ] CudnnConvolution.Setup
   [       OK ] CudnnConvolution.Setup (0 ms)
   [ RUN      ] CudnnConvolution.Forward
   [       OK ] CudnnConvolution.Forward (30 ms)
   [ RUN      ] CudnnConvolution.Backward
   [       OK ] CudnnConvolution.Backward (28 ms)
   [----------] 3 tests from CudnnConvolution (58 ms total)
   
   [----------] 3 tests from CudnnConvolution_AT
   [ RUN      ] CudnnConvolution_AT.Setup
   [       OK ] CudnnConvolution_AT.Setup (0 ms)
   [ RUN      ] CudnnConvolution_AT.Forward
   [       OK ] CudnnConvolution_AT.Forward (29 ms)
   [ RUN      ] CudnnConvolution_AT.Backward
   [       OK ] CudnnConvolution_AT.Backward (29 ms)
   [----------] 3 tests from CudnnConvolution_AT (58 ms total)
   
   [----------] 3 tests from CudnnDropout
   [ RUN      ] CudnnDropout.Setup
   [       OK ] CudnnDropout.Setup (0 ms)
   [ RUN      ] CudnnDropout.Forward
   [       OK ] CudnnDropout.Forward (53 ms)
   [ RUN      ] CudnnDropout.Backward
   [       OK ] CudnnDropout.Backward (53 ms)
   [----------] 3 tests from CudnnDropout (106 ms total)
   
   [----------] 3 tests from CudnnLRN
   [ RUN      ] CudnnLRN.Setup
   [       OK ] CudnnLRN.Setup (0 ms)
   [ RUN      ] CudnnLRN.Forward
   [       OK ] CudnnLRN.Forward (25 ms)
   [ RUN      ] CudnnLRN.Backward
   [       OK ] CudnnLRN.Backward (25 ms)
   [----------] 3 tests from CudnnLRN (50 ms total)
   
   [----------] 3 tests from CudnnPooling
   [ RUN      ] CudnnPooling.Setup
   [       OK ] CudnnPooling.Setup (0 ms)
   [ RUN      ] CudnnPooling.Forward
   [       OK ] CudnnPooling.Forward (25 ms)
   [ RUN      ] CudnnPooling.Backward
   [       OK ] CudnnPooling.Backward (25 ms)
   [----------] 3 tests from CudnnPooling (50 ms total)
   
   [----------] 3 tests from TestCudnnRNN
   [ RUN      ] TestCudnnRNN.Setup
   [       OK ] TestCudnnRNN.Setup (0 ms)
   [ RUN      ] TestCudnnRNN.Forward
   [       OK ] TestCudnnRNN.Forward (48 ms)
   [ RUN      ] TestCudnnRNN.Backward
   [       OK ] TestCudnnRNN.Backward (48 ms)
   [----------] 3 tests from TestCudnnRNN (96 ms total)
   
   [----------] 5 tests from CudnnSoftmax
   [ RUN      ] CudnnSoftmax.Setup
   [       OK ] CudnnSoftmax.Setup (0 ms)
   [ RUN      ] CudnnSoftmax.Forward1D
   [       OK ] CudnnSoftmax.Forward1D (24 ms)
   [ RUN      ] CudnnSoftmax.Backward1D
   [       OK ] CudnnSoftmax.Backward1D (24 ms)
   [ RUN      ] CudnnSoftmax.Forward2D
   [       OK ] CudnnSoftmax.Forward2D (24 ms)
   [ RUN      ] CudnnSoftmax.Backward2D
   [       OK ] CudnnSoftmax.Backward2D (24 ms)
   [----------] 5 tests from CudnnSoftmax (96 ms total)
   
   [----------] 5 tests from Dense
   [ RUN      ] Dense.Setup
   [       OK ] Dense.Setup (0 ms)
   [ RUN      ] Dense.ForwardCpp
   [       OK ] Dense.ForwardCpp (1 ms)
   [ RUN      ] Dense.BackwardCpp
   [       OK ] Dense.BackwardCpp (0 ms)
   [ RUN      ] Dense.ForwardCuda
   [       OK ] Dense.ForwardCuda (24 ms)
   [ RUN      ] Dense.BackwardCuda
   [       OK ] Dense.BackwardCuda (24 ms)
   [----------] 5 tests from Dense (49 ms total)
   
   [----------] 3 tests from Dropout
   [ RUN      ] Dropout.Setup
   [       OK ] Dropout.Setup (0 ms)
   [ RUN      ] Dropout.Forward
   [       OK ] Dropout.Forward (0 ms)
   [ RUN      ] Dropout.Backward
   [       OK ] Dropout.Backward (1 ms)
   [----------] 3 tests from Dropout (1 ms total)
   
   [----------] 5 tests from Flatten
   [ RUN      ] Flatten.Setup
   [       OK ] Flatten.Setup (0 ms)
   [ RUN      ] Flatten.ForwardCPU
   [       OK ] Flatten.ForwardCPU (0 ms)
   [ RUN      ] Flatten.BackwardCPU
   [       OK ] Flatten.BackwardCPU (0 ms)
   [ RUN      ] Flatten.ForwardGPU
   [       OK ] Flatten.ForwardGPU (24 ms)
   [ RUN      ] Flatten.BackwardGPU
   [       OK ] Flatten.BackwardGPU (24 ms)
   [----------] 5 tests from Flatten (48 ms total)
   
   [----------] 5 tests from ImageTransformer
   [ RUN      ] ImageTransformer.Setup
   [       OK ] ImageTransformer.Setup (0 ms)
   [ RUN      ] ImageTransformer.Apply3D
   [       OK ] ImageTransformer.Apply3D (0 ms)
   [ RUN      ] ImageTransformer.Apply2D
   [       OK ] ImageTransformer.Apply2D (0 ms)
   [ RUN      ] ImageTransformer.Crop
   [       OK ] ImageTransformer.Crop (0 ms)
   [ RUN      ] ImageTransformer.Mirror
   [       OK ] ImageTransformer.Mirror (0 ms)
   [----------] 5 tests from ImageTransformer (0 ms total)
   
   [----------] 5 tests from Initializer
   [ RUN      ] Initializer.Constant
   [       OK ] Initializer.Constant (0 ms)
   [ RUN      ] Initializer.Gaussian
   [       OK ] Initializer.Gaussian (0 ms)
   [ RUN      ] Initializer.ConstantCUDA
   [       OK ] Initializer.ConstantCUDA (24 ms)
   [ RUN      ] Initializer.GaussianCUDA
   [       OK ] Initializer.GaussianCUDA (35 ms)
   [ RUN      ] Initializer.XavierCUDA
   [       OK ] Initializer.XavierCUDA (24 ms)
   [----------] 5 tests from Initializer (83 ms total)
   
   [----------] 2 tests from Layer
   [ RUN      ] Layer.CreateLayer
   [       OK ] Layer.CreateLayer (0 ms)
   [ RUN      ] Layer.CreateCudnnLayer
   [       OK ] Layer.CreateCudnnLayer (0 ms)
   [----------] 2 tests from Layer (0 ms total)
   
   [----------] 6 tests from Logging
   [ RUN      ] Logging.InfoLogging
   [I d1125 t06:24:50 p06772:-560 /home/ubuntu/singa/test/singa/test_logging.cc:29] test info logging
   [       OK ] Logging.InfoLogging (0 ms)
   [ RUN      ] Logging.WarningLogging
   [W d1125 t06:24:50 p06772:-560 /home/ubuntu/singa/test/singa/test_logging.cc:35] test warning logging
   [       OK ] Logging.WarningLogging (0 ms)
   [ RUN      ] Logging.ErrorLogging
   [E d1125 t06:24:50 p06772:-560 /home/ubuntu/singa/test/singa/test_logging.cc:41] test error logging
   [       OK ] Logging.ErrorLogging (0 ms)
   [ RUN      ] Logging.FatalLogging
   [       OK ] Logging.FatalLogging (0 ms)
   [ RUN      ] Logging.SetLogDestination
   [       OK ] Logging.SetLogDestination (1 ms)
   [ RUN      ] Logging.StderrLoggingLevel
   [W d1125 t06:24:50 p06772:-560 /home/ubuntu/singa/test/singa/test_logging.cc:62] test warning logging to stderr and file
   [E d1125 t06:24:50 p06772:-560 /home/ubuntu/singa/test/singa/test_logging.cc:63] test error logging to stderr and file
   [       OK ] Logging.StderrLoggingLevel (0 ms)
   [----------] 6 tests from Logging (1 ms total)
   
   [----------] 3 tests from LRN
   [ RUN      ] LRN.Setup
   [       OK ] LRN.Setup (0 ms)
   [ RUN      ] LRN.Forward
   [       OK ] LRN.Forward (0 ms)
   [ RUN      ] LRN.Backward
   [       OK ] LRN.Backward (0 ms)
   [----------] 3 tests from LRN (0 ms total)
   
   [----------] 1 test from MemPool
   [ RUN      ] MemPool.CompareCudaCnmem
   [       OK ] MemPool.CompareCudaCnmem (955 ms)
   [----------] 1 test from MemPool (955 ms total)
   
   [----------] 4 tests from TestMSE
   [ RUN      ] TestMSE.CppForward
   [       OK ] TestMSE.CppForward (0 ms)
   [ RUN      ] TestMSE.CppBackward
   [       OK ] TestMSE.CppBackward (0 ms)
   [ RUN      ] TestMSE.CudaForward
   [       OK ] TestMSE.CudaForward (24 ms)
   [ RUN      ] TestMSE.CudaBackward
   [       OK ] TestMSE.CudaBackward (25 ms)
   [----------] 4 tests from TestMSE (49 ms total)
   
   [----------] 2 tests from Nesterov
   [ RUN      ] Nesterov.ApplyCPU
   [       OK ] Nesterov.ApplyCPU (0 ms)
   [ RUN      ] Nesterov.ApplyCUDA
   [       OK ] Nesterov.ApplyCUDA (24 ms)
   [----------] 2 tests from Nesterov (24 ms total)
   
   [----------] 4 tests from OperationBatchNorm
   [ RUN      ] OperationBatchNorm.ForwardInference
   [       OK ] OperationBatchNorm.ForwardInference (5 ms)
   [ RUN      ] OperationBatchNorm.ForwardInference4D
   [       OK ] OperationBatchNorm.ForwardInference4D (1 ms)
   [ RUN      ] OperationBatchNorm.ForwardTraining
   [       OK ] OperationBatchNorm.ForwardTraining (0 ms)
   [ RUN      ] OperationBatchNorm.Backward
   [       OK ] OperationBatchNorm.Backward (0 ms)
   [----------] 4 tests from OperationBatchNorm (6 ms total)
   
   [----------] 2 tests from Operation_Convolution
   [ RUN      ] Operation_Convolution.Forward
   [       OK ] Operation_Convolution.Forward (24 ms)
   [ RUN      ] Operation_Convolution.Backward
   [       OK ] Operation_Convolution.Backward (1 ms)
   [----------] 2 tests from Operation_Convolution (25 ms total)
   
   [----------] 4 tests from OperationPooling
   [ RUN      ] OperationPooling.Forward
   [       OK ] OperationPooling.Forward (0 ms)
   [ RUN      ] OperationPooling.ForwardAverage
   [       OK ] OperationPooling.ForwardAverage (0 ms)
   [ RUN      ] OperationPooling.Backward
   [       OK ] OperationPooling.Backward (0 ms)
   [ RUN      ] OperationPooling.BackwardAvg
   [       OK ] OperationPooling.BackwardAvg (0 ms)
   [----------] 4 tests from OperationPooling (0 ms total)
   
   [----------] 5 tests from Platform
   [ RUN      ] Platform.CreateMultDevice
   [       OK ] Platform.CreateMultDevice (4 ms)
   [ RUN      ] Platform.NumGPUs
   [       OK ] Platform.NumGPUs (0 ms)
   [ RUN      ] Platform.QueryMem
   [       OK ] Platform.QueryMem (0 ms)
   [ RUN      ] Platform.CreateDevice
   [       OK ] Platform.CreateDevice (1 ms)
   [ RUN      ] Platform.CreatTensor
   [       OK ] Platform.CreatTensor (0 ms)
   [----------] 5 tests from Platform (5 ms total)
   
   [----------] 3 tests from Pooling
   [ RUN      ] Pooling.Setup
   [       OK ] Pooling.Setup (0 ms)
   [ RUN      ] Pooling.Forward
   [       OK ] Pooling.Forward (0 ms)
   [ RUN      ] Pooling.Backward
   [       OK ] Pooling.Backward (0 ms)
   [----------] 3 tests from Pooling (0 ms total)
   
   [----------] 5 tests from PReLU
   [ RUN      ] PReLU.Setup
   [       OK ] PReLU.Setup (0 ms)
   [ RUN      ] PReLU.ForwardCPU
   [       OK ] PReLU.ForwardCPU (0 ms)
   [ RUN      ] PReLU.BackwardCPU
   [       OK ] PReLU.BackwardCPU (0 ms)
   [ RUN      ] PReLU.ForwardGPU
   [       OK ] PReLU.ForwardGPU (2 ms)
   [ RUN      ] PReLU.BackwardGPU
   [       OK ] PReLU.BackwardGPU (33 ms)
   [----------] 5 tests from PReLU (35 ms total)
   
   [----------] 2 tests from RMSProp
   [ RUN      ] RMSProp.ApplyCPU
   [       OK ] RMSProp.ApplyCPU (0 ms)
   [ RUN      ] RMSProp.ApplyCUDA
   [       OK ] RMSProp.ApplyCUDA (22 ms)
   [----------] 2 tests from RMSProp (22 ms total)
   
   [----------] 4 tests from SGD
   [ RUN      ] SGD.ApplyWithoutMomentum
   [       OK ] SGD.ApplyWithoutMomentum (0 ms)
   [ RUN      ] SGD.ApplyWithMomentum
   [       OK ] SGD.ApplyWithMomentum (0 ms)
   [ RUN      ] SGD.ApplyWithoutMomentumCuda
   [       OK ] SGD.ApplyWithoutMomentumCuda (23 ms)
   [ RUN      ] SGD.ApplyWithMomentumCuda
   [       OK ] SGD.ApplyWithMomentumCuda (22 ms)
   [----------] 4 tests from SGD (45 ms total)
   
   [----------] 9 tests from Slice
   [ RUN      ] Slice.Setup
   [       OK ] Slice.Setup (0 ms)
   [ RUN      ] Slice.ForwardSliceRowCpp
   [       OK ] Slice.ForwardSliceRowCpp (0 ms)
   [ RUN      ] Slice.ForwardSliceColumn
   [       OK ] Slice.ForwardSliceColumn (0 ms)
   [ RUN      ] Slice.ForwardSliceRowCuda
   [       OK ] Slice.ForwardSliceRowCuda (33 ms)
   [ RUN      ] Slice.ForwardSliceColumnCuda
   [       OK ] Slice.ForwardSliceColumnCuda (33 ms)
   [ RUN      ] Slice.BackwardSliceRowCpp
   [       OK ] Slice.BackwardSliceRowCpp (0 ms)
   [ RUN      ] Slice.BackwardSliceColumn
   [       OK ] Slice.BackwardSliceColumn (0 ms)
   [ RUN      ] Slice.BackwardSliceRowCuda
   [       OK ] Slice.BackwardSliceRowCuda (23 ms)
   [ RUN      ] Slice.BackwardSliceColumnCuda
   [       OK ] Slice.BackwardSliceColumnCuda (22 ms)
   [----------] 9 tests from Slice (111 ms total)
   
   [----------] 3 tests from Snapshot
   [ RUN      ] Snapshot.WriteTest
   [       OK ] Snapshot.WriteTest (0 ms)
   [ RUN      ] Snapshot.ReadTest
   [       OK ] Snapshot.ReadTest (0 ms)
   [ RUN      ] Snapshot.ReadIntTest
   [       OK ] Snapshot.ReadIntTest (0 ms)
   [----------] 3 tests from Snapshot (1 ms total)
   
   [----------] 3 tests from Softmax
   [ RUN      ] Softmax.Setup
   [       OK ] Softmax.Setup (0 ms)
   [ RUN      ] Softmax.Forward
   [       OK ] Softmax.Forward (0 ms)
   [ RUN      ] Softmax.Backward
   [       OK ] Softmax.Backward (0 ms)
   [----------] 3 tests from Softmax (0 ms total)
   
   [----------] 11 tests from TensorClass
   [ RUN      ] TensorClass.Constructor
   [       OK ] TensorClass.Constructor (0 ms)
   [ RUN      ] TensorClass.Reshape
   [       OK ] TensorClass.Reshape (0 ms)
   [ RUN      ] TensorClass.AsType
   [       OK ] TensorClass.AsType (0 ms)
   [ RUN      ] TensorClass.ToDevice
   [       OK ] TensorClass.ToDevice (0 ms)
   [ RUN      ] TensorClass.CopyDataFromHostPtr
   [       OK ] TensorClass.CopyDataFromHostPtr (0 ms)
   [ RUN      ] TensorClass.CopyData
   [       OK ] TensorClass.CopyData (0 ms)
   [ RUN      ] TensorClass.Clone
   [       OK ] TensorClass.Clone (0 ms)
   [ RUN      ] TensorClass.T
   [       OK ] TensorClass.T (0 ms)
   [ RUN      ] TensorClass.Repeat
   [       OK ] TensorClass.Repeat (0 ms)
   [ RUN      ] TensorClass.RepeatData
   [       OK ] TensorClass.RepeatData (0 ms)
   [ RUN      ] TensorClass.Broadcast
   [       OK ] TensorClass.Broadcast (0 ms)
   [----------] 11 tests from TensorClass (0 ms total)
   
   [----------] 67 tests from TensorMath
   [ RUN      ] TensorMath.AbsCpp
   [       OK ] TensorMath.AbsCpp (0 ms)
   [ RUN      ] TensorMath.ExpCpp
   [       OK ] TensorMath.ExpCpp (0 ms)
   [ RUN      ] TensorMath.ExpStrideCpp
   WARNING: Logging before InitGoogleLogging() is written to STDERR
   I1125 06:24:52.185856  6772 tensor_math_cpp.h:146] not equal stride
   [       OK ] TensorMath.ExpStrideCpp (0 ms)
   [ RUN      ] TensorMath.LogCpp
   [       OK ] TensorMath.LogCpp (0 ms)
   [ RUN      ] TensorMath.ReLUCpp
   [       OK ] TensorMath.ReLUCpp (0 ms)
   [ RUN      ] TensorMath.SigmoidCpp
   [       OK ] TensorMath.SigmoidCpp (0 ms)
   [ RUN      ] TensorMath.SignCpp
   [       OK ] TensorMath.SignCpp (0 ms)
   [ RUN      ] TensorMath.SqrtCpp
   [       OK ] TensorMath.SqrtCpp (0 ms)
   [ RUN      ] TensorMath.SquareCpp
   [       OK ] TensorMath.SquareCpp (0 ms)
   [ RUN      ] TensorMath.TanhCpp
   [       OK ] TensorMath.TanhCpp (0 ms)
   [ RUN      ] TensorMath.SumCpp
   [       OK ] TensorMath.SumCpp (0 ms)
   [ RUN      ] TensorMath.SoftMaxCpp
   [       OK ] TensorMath.SoftMaxCpp (0 ms)
   [ RUN      ] TensorMath.SoftMaxOnAxis
   [       OK ] TensorMath.SoftMaxOnAxis (31 ms)
   [ RUN      ] TensorMath.LTCpp
   [       OK ] TensorMath.LTCpp (0 ms)
   [ RUN      ] TensorMath.LECpp
   [       OK ] TensorMath.LECpp (0 ms)
   [ RUN      ] TensorMath.GTCpp
   [       OK ] TensorMath.GTCpp (0 ms)
   [ RUN      ] TensorMath.GECpp
   [       OK ] TensorMath.GECpp (0 ms)
   [ RUN      ] TensorMath.PowCpp
   [       OK ] TensorMath.PowCpp (0 ms)
   [ RUN      ] TensorMath.SubCpp
   [       OK ] TensorMath.SubCpp (0 ms)
   [ RUN      ] TensorMath.EltwiseMultCpp
   [       OK ] TensorMath.EltwiseMultCpp (0 ms)
   [ RUN      ] TensorMath.DivCpp
   [       OK ] TensorMath.DivCpp (0 ms)
   [ RUN      ] TensorMath.BernoulliCpp
   [       OK ] TensorMath.BernoulliCpp (1 ms)
   [ RUN      ] TensorMath.UniformCpp
   [       OK ] TensorMath.UniformCpp (0 ms)
   [ RUN      ] TensorMath.GaussianCpp
   [       OK ] TensorMath.GaussianCpp (3 ms)
   [ RUN      ] TensorMath.AddTensorCpp
   [       OK ] TensorMath.AddTensorCpp (0 ms)
   [ RUN      ] TensorMath.AddTensorsCpp
   [       OK ] TensorMath.AddTensorsCpp (0 ms)
   [ RUN      ] TensorMath.SetValueCpp
   [       OK ] TensorMath.SetValueCpp (0 ms)
   [ RUN      ] TensorMath.ReshapeCpp
   [       OK ] TensorMath.ReshapeCpp (0 ms)
   [ RUN      ] TensorMath.TransposeReshapeCpp
   I1125 06:24:52.221244  6772 tensor_math_cpp.h:146] not equal stride
   I1125 06:24:52.221282  6772 tensor_math_cpp.h:146] not equal stride
   [       OK ] TensorMath.TransposeReshapeCpp (0 ms)
   [ RUN      ] TensorMath.TransposeFloatCpp
   I1125 06:24:52.221330  6772 tensor_math_cpp.h:146] not equal stride
   [       OK ] TensorMath.TransposeFloatCpp (0 ms)
   [ RUN      ] TensorMath.TransposeIntCpp
   I1125 06:24:52.221371  6772 tensor_math_cpp.h:146] not equal stride
   [       OK ] TensorMath.TransposeIntCpp (0 ms)
   [ RUN      ] TensorMath.BroadcastCpp
   [       OK ] TensorMath.BroadcastCpp (0 ms)
   [ RUN      ] TensorMath.L2Cpp
   [       OK ] TensorMath.L2Cpp (0 ms)
   [ RUN      ] TensorMath.MultCpp
   [       OK ] TensorMath.MultCpp (0 ms)
   [ RUN      ] TensorMath.AddColumnCpp
   [       OK ] TensorMath.AddColumnCpp (0 ms)
   [ RUN      ] TensorMath.SubColumnCpp
   [       OK ] TensorMath.SubColumnCpp (0 ms)
   [ RUN      ] TensorMath.DivColumnCpp
   [       OK ] TensorMath.DivColumnCpp (0 ms)
   [ RUN      ] TensorMath.AddRowCpp
   [       OK ] TensorMath.AddRowCpp (0 ms)
   [ RUN      ] TensorMath.SubRowCpp
   [       OK ] TensorMath.SubRowCpp (0 ms)
   [ RUN      ] TensorMath.MultRowCpp
   [       OK ] TensorMath.MultRowCpp (0 ms)
   [ RUN      ] TensorMath.MultColumnCpp
   [       OK ] TensorMath.MultColumnCpp (0 ms)
   [ RUN      ] TensorMath.DivRowCpp
   [       OK ] TensorMath.DivRowCpp (0 ms)
   [ RUN      ] TensorMath.SumRowsCpp
   [       OK ] TensorMath.SumRowsCpp (0 ms)
   [ RUN      ] TensorMath.SumColumnsCpp
   [       OK ] TensorMath.SumColumnsCpp (0 ms)
   [ RUN      ] TensorMath.ConcatenateRowsCpp
   [       OK ] TensorMath.ConcatenateRowsCpp (0 ms)
   [ RUN      ] TensorMath.ConcatenateColumnsCpp
   [       OK ] TensorMath.ConcatenateColumnsCpp (0 ms)
   [ RUN      ] TensorMath.CopyRowsCpp
   [       OK ] TensorMath.CopyRowsCpp (0 ms)
   [ RUN      ] TensorMath.CopyColumnsCpp
   [       OK ] TensorMath.CopyColumnsCpp (0 ms)
   [ RUN      ] TensorMath.L2Cuda
   [       OK ] TensorMath.L2Cuda (19 ms)
   [ RUN      ] TensorMath.MultCuda
   [       OK ] TensorMath.MultCuda (22 ms)
   [ RUN      ] TensorMath.AddColumnCuda
   [       OK ] TensorMath.AddColumnCuda (23 ms)
   [ RUN      ] TensorMath.SubColumnCuda
   [       OK ] TensorMath.SubColumnCuda (22 ms)
   [ RUN      ] TensorMath.MultColumnCuda
   [       OK ] TensorMath.MultColumnCuda (23 ms)
   [ RUN      ] TensorMath.DivColumnCuda
   [       OK ] TensorMath.DivColumnCuda (22 ms)
   [ RUN      ] TensorMath.AddRowCuda
   [       OK ] TensorMath.AddRowCuda (23 ms)
   [ RUN      ] TensorMath.SubRowCuda
   [       OK ] TensorMath.SubRowCuda (22 ms)
   [ RUN      ] TensorMath.MultRowCuda
   [       OK ] TensorMath.MultRowCuda (23 ms)
   [ RUN      ] TensorMath.DivRowCuda
   [       OK ] TensorMath.DivRowCuda (23 ms)
   [ RUN      ] TensorMath.SumRowsCuda
   [       OK ] TensorMath.SumRowsCuda (22 ms)
   [ RUN      ] TensorMath.SumColumnCuda
   [       OK ] TensorMath.SumColumnCuda (23 ms)
   [ RUN      ] TensorMath.ExpStrideCuda
   [       OK ] TensorMath.ExpStrideCuda (22 ms)
   [ RUN      ] TensorMath.ConcatenateRowsCuda
   [       OK ] TensorMath.ConcatenateRowsCuda (23 ms)
   [ RUN      ] TensorMath.ConcatenateColumnsCuda
   [       OK ] TensorMath.ConcatenateColumnsCuda (22 ms)
   [ RUN      ] TensorMath.CopyRowsCuda
   [       OK ] TensorMath.CopyRowsCuda (23 ms)
   [ RUN      ] TensorMath.CopyColumnsCuda
   [       OK ] TensorMath.CopyColumnsCuda (22 ms)
   [ RUN      ] TensorMath.RowMaxCuda
   [       OK ] TensorMath.RowMaxCuda (23 ms)
   [ RUN      ] TensorMath.BroadcastCuda
   [       OK ] TensorMath.BroadcastCuda (23 ms)
   [----------] 67 tests from TensorMath (461 ms total)
   
   [----------] 2 tests from TextFileWriter
   [ RUN      ] TextFileWriter.Create
   [       OK ] TextFileWriter.Create (0 ms)
   [ RUN      ] TextFileWriter.Append
   [       OK ] TextFileWriter.Append (0 ms)
   [----------] 2 tests from TextFileWriter (0 ms total)
   
   [----------] 2 tests from TextFileReader
   [ RUN      ] TextFileReader.Read
   [       OK ] TextFileReader.Read (0 ms)
   [ RUN      ] TextFileReader.SeekToFirst
   [       OK ] TextFileReader.SeekToFirst (0 ms)
   [----------] 2 tests from TextFileReader (0 ms total)
   
   [----------] 1 test from TimerTest
   [ RUN      ] TimerTest.TestTick
   [       OK ] TimerTest.TestTick (10 ms)
   [----------] 1 test from TimerTest (10 ms total)
   
   [----------] Global test environment tear-down
   [==========] 238 tests from 48 test cases ran. (3721 ms total)
   [  PASSED  ] 238 tests.
   
   ubuntu@ip-172-31-35-106:~/singa/test/python$ python3 test_operation.py
   ........................................................................................WARNING: Logging before InitGoogleLogging() is written to STDERR
   I1125 06:25:48.732702  6894 tensor_math_cpp.h:146] not equal stride
   I1125 06:25:48.733002  6894 tensor_math_cpp.h:146] not equal stride
   .I1125 06:25:48.733582  6894 tensor_math_cpp.h:146] not equal stride
   I1125 06:25:48.733937  6894 tensor_math_cpp.h:146] not equal stride
   ......
   ----------------------------------------------------------------------
   Ran 95 tests in 0.793s
   
   OK
   
   ubuntu@ip-172-31-35-106:~/singa/test/python$ python3 test_api.py
   ....
   ----------------------------------------------------------------------
   Ran 4 tests in 0.027s
   
   OK
   
   ubuntu@ip-172-31-35-106:~/singa/examples/autograd$ python3 mnist_cnn.py
   Starting Epoch 0:
   Training loss = 580.591003, training accuracy = 0.795157
   Evaluation accuracy = 0.945112, Elapsed Time = 3.683850s
   Starting Epoch 1:
   Training loss = 233.889755, training accuracy = 0.921642
   Evaluation accuracy = 0.964844, Elapsed Time = 3.560193s
   Starting Epoch 2:
   Training loss = 168.449493, training accuracy = 0.943820
   Evaluation accuracy = 0.967548, Elapsed Time = 3.550486s
   Starting Epoch 3:
   Training loss = 134.752670, training accuracy = 0.954492
   Evaluation accuracy = 0.974259, Elapsed Time = 3.553507s
   Starting Epoch 4:
   Training loss = 115.290237, training accuracy = 0.961496
   Evaluation accuracy = 0.978666, Elapsed Time = 3.555326s
   Starting Epoch 5:
   Training loss = 104.051346, training accuracy = 0.964915
   Evaluation accuracy = 0.980268, Elapsed Time = 3.564088s
   Starting Epoch 6:
   Training loss = 91.083107, training accuracy = 0.969500
   Evaluation accuracy = 0.979768, Elapsed Time = 3.549129s
   Starting Epoch 7:
   Training loss = 87.020187, training accuracy = 0.971718
   Evaluation accuracy = 0.980469, Elapsed Time = 3.552619s
   Starting Epoch 8:
   Training loss = 80.623665, training accuracy = 0.972869
   Evaluation accuracy = 0.980469, Elapsed Time = 3.559611s
   Starting Epoch 9:
   Training loss = 76.131233, training accuracy = 0.974303
   Evaluation accuracy = 0.974860, Elapsed Time = 3.551645s
   
   ubuntu@ip-172-31-35-106:~/singa/examples/autograd$ python3 resnet.py
   Start intialization............
   100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:10<00:00,  1.42it/s]
   Throughput = 45.51236531234027 per second
   Total=0.7031056237220764, forward=0.21804919958114624, softmax=0.0019757962226867676, backward=0.4830806279182434, sgd=0.016749765872955322
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services