You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/03/05 19:50:00 UTC

[GitHub] [incubator-mxnet] ptrendx commented on a change in pull request #17767: [WIP] Fix and optimize handling of vectorized memory accesses

ptrendx commented on a change in pull request #17767: [WIP] Fix and optimize handling of vectorized memory accesses
URL: https://github.com/apache/incubator-mxnet/pull/17767#discussion_r388525308
 
 

 ##########
 File path: src/common/cuda_vectorization.cuh
 ##########
 @@ -0,0 +1,228 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ *  Copyright (c) 2020 by Contributors
+ * \file cuda_vectorization.cuh
+ * \brief GPU helpers for vectorized memory accesses
+ */
+
+#ifndef MXNET_COMMON_CUDA_VECTORIZATION_CUH_
+#define MXNET_COMMON_CUDA_VECTORIZATION_CUH_
+
+#include <cuda_runtime.h>
+#include "cuda_utils.h"
+
+#if MXNET_USE_CUDA && __CUDACC__
+
+namespace mxnet {
+namespace common {
+namespace cuda {
+
+template <typename DType, typename LType>
+class VectorizedStorage {
+ public:
+  constexpr static int nvec = sizeof(LType) / sizeof(DType);
+  union vectorized_storage {
+    LType aligned;
+    DType separate[nvec];  // NOLINT(*)
+
+    MSHADOW_XINLINE vectorized_storage() {}
+    MSHADOW_XINLINE ~vectorized_storage() {}
 
 Review comment:
   MSHADOW_XINLINE is useful because it is empty if you are not running with NVCC (and GCC does not understand `__host__ __device__`).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services