You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by ma...@apache.org on 2021/05/29 07:06:47 UTC
[tvm] branch main updated: [Docs] Added developer documentation for DeviceAPI and Target. (#8082)

This is an automated email from the ASF dual-hosted git repository.

masahi pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
     new d78cd07  [Docs] Added developer documentation for DeviceAPI and Target. (#8082)
d78cd07 is described below

commit d78cd07864a90f4aedcc89d48e8af00c8e5014be
Author: Lunderberg <Lu...@users.noreply.github.com>
AuthorDate: Sat May 29 00:06:27 2021 -0700

    [Docs] Added developer documentation for DeviceAPI and Target. (#8082)
    
    * [Docs] Added developer documentation for DeviceAPI and Target.
    
    * [Docs] Update on the DeviceAPI/Target documentation.
    
    - Clarified wording based on suggestions from @csullivan
    - Fixed incorrect links to `c_runtime_api.h`
    
    * [Docs] Update on the DeviceAPI/Target documentation.
    
    - Switched from argument style example of `tvm.target.Target` to a
      JSON-formatted string, based on @zxybazh's suggestion.
    
    Co-authored-by: Eric Lunderberg <el...@octoml.ai>
---
 docs/dev/device_target_interactions.rst | 238 ++++++++++++++++++++++++++++++++
 docs/dev/index.rst                      |   6 +
 docs/dev/runtime.rst                    |   9 +-
 3 files changed, 251 insertions(+), 2 deletions(-)

diff --git a/docs/dev/device_target_interactions.rst b/docs/dev/device_target_interactions.rst
new file mode 100644
index 0000000..373b8fe
--- /dev/null
+++ b/docs/dev/device_target_interactions.rst
@@ -0,0 +1,238 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+.. _tvm-target-specific-overview:
+
+Device/Target Interactions
+--------------------------
+
+This documented is intended for developers interested in understanding
+how the TVM framework interacts with specific device APIs, or who
+may want to implement support for a new API or new hardware.
+
+There are three main aspects that must be implemented for any new
+runtime environment.
+
+* The :ref:`DeviceAPI <tvm-target-specific-device-api>` class gives a
+  handle to a specific device, and the API used to interact with it.
+  It defines a common interface for querying device parameters
+  (e.g. memory available, number of threads, etc.) and for performing
+  simple actions (e.g. copying memory from the host, or between
+  buffers on the device).
+
+* The :ref:`Target <tvm-target-specific-target>` class contains a
+  description of the device on which a function will run.  It is
+  exposed both to the target code generators and to the optimization
+  passes.
+
+* The :ref:`target code generators <tvm-target-specific-codegen>`
+  construct a :ref:`Module <tvm-runtime-system-module>` consisting of
+  one or more :ref:`PackedFunc <tvm-runtime-system-packed-func>`, from
+  an IRModule.
+
+.. _tvm-target-specific-device-api:
+
+DeviceAPI
+---------
+
+The ``DeviceAPI`` represents a handle to a specific hardware device
+API.  (e.g. ``CUDADeviceAPI`` handles all interactions through the
+CUDA framework.)  Most ``DeviceAPI`` methods accept a ``device_id``
+parameter to specify which device should be accessed.  In Python,
+these are typically accessed using the :py:func:`tvm.runtime.device`
+function, which returns a handle to a specific device, accessed
+through a specific API.  (e.g. ``tvm.runtime.device('cuda',0)`` gives
+access to physical device ``0``, accessed through the CUDA API.)
+
+.. _device_api.h: https://github.com/apache/tvm/blob/main/include/tvm/runtime/device_api.h
+
+* Attribute queries - ``GetAttr`` allows different
+  device-specific parameters to be queried, such as the device name,
+  number of threads, etc.  The parameters that can be queried are
+  defined in ``enum DeviceAttrKind`` in `device_api.h`_.  Not all
+  query-able parameters are supported by all devices.  If a parameter
+  cannot be queried (e.g. ``kMaxClockRate`` on Vulkan), or if a
+  parameter isn't applicable (e.g. ``kWarpSize`` on CPU), then those
+  queries should return ``nullptr``.
+
+* Setting active device - ``SetDevice`` should set a
+  particular device as being active.  If a ``PackedFunc`` generated by
+  the target-specific code gen requires execution on a device, it
+  should run on the active device.
+
+* Memory management - Utilities for allocating and deallocating memory
+  on the device.
+
+  * Allocate data space - ``AllocDataSpace`` and ``FreeDataSpace``
+    allocate and free space on the device.  These allocations can be
+    provided as inputs and outputs to an operator and make up the
+    primary data flow of the operator graph.  It must be possible to
+    transfer data from the host to/from a data space.  The return
+    value is an opaque ``void*``.  While some implementations return a
+    memory address, this is not required, and the ``void*`` may be an
+    opaque handle that is interpretable only by the device backend
+    that generated it.  The ``void*`` is used as an argument to other
+    backend-specific functions, such as ``CopyDataFromTo``.
+
+  * Allocate work space - ``AllocWorkspace`` and ``FreeWorkspace``
+    allocate and free space on the device.  Unlike data space, these
+    are used for storage of intermediate values within an operator
+    definition, and are not required to be transferable to/from the
+    host device.  If a ``DeviceAPI`` subclass does not implement these
+    methods, they will default to calling the corresponding
+    ``DataSpace`` functions.
+
+  * Copy data - ``CopyDataFromTo`` should copy data from one location
+    to another.  The type of copy is determined by the ``dev_from``
+    and ``dev_to`` parameters.  Implementations should support copying
+    memory from CPU to device, from device to CPU, and from one buffer
+    to another on a single device.  If the source or destination
+    locations are on the CPU, the corresponding ``void*`` points to a
+    CPU address that can be passed into ``memcpy``.  If the source or
+    destinations locations are on the device, the corresponding
+    ``void*`` was previously generated by either ``AllocDataSpace`` or
+    ``AllocWorkspace``.
+
+    These copies are queued to execute on a specific
+    ``TVMStreamHandle``.  However, implementations should not assume
+    that CPU buffers remains valid or accessible after the call to
+    ``CopyDataFromTo`` completes.
+
+
+* Execution stream management - Utilities for handling
+  ``TVMStreamHandle``, which represents parallel streams of execution
+  used to execute commands.
+
+  * Create stream - ``CreateStream`` and ``FreeStream`` should
+    allocate/free a handle to a stream of execution. If a device
+    implements only a single queue of commands, then ``CreateStream``
+    should return ``nullptr``.
+
+  * Set active stream - ``SetStream`` should set a stream as being
+    active.  While active, if a ``PackedFunc`` generated by the
+    target-specific code gen requires execution on a device, the work
+    should be submitted to the active stream.
+
+  * Synchronize to CPU - ``StreamSync`` should synchronize a stream of
+    execution to the CPU.  The call to ``StreamSync`` should return
+    once all memory transfers and computations submitted prior to the
+    ``StreamSync`` call have completed.
+
+  * Synchronize between streams - ``SyncStreamFromTo`` should
+    introduce a synchronization barrier between the source and
+    destination stream.  That is, the destination stream may not
+    proceed beyond commands currently queued until the source stream
+    has completed all commands that are currently queued.
+
+
+In order to be usable by the TVM framework, the new DeviceAPI should
+then be registered with the following steps.
+
+#. Create a function that instantiates the new DeviceAPI, and returns
+   a pointer to it::
+
+     FooDeviceAPI* FooDeviceAPI::Global() {
+       static FooDeviceAPI inst;
+       return &inst;
+     }
+
+#. Register the function to the tvm registry::
+
+     TVM_REGISTER_GLOBAL("device_api.foo").set_body_typed(FooDeviceAPI::Global);
+
+.. _c_runtime_api.h: https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h
+
+#. Add an entry for the new DeviceAPI to the ``TVMDeviceExtType`` enum
+   in `c_runtime_api.h`_.  The value should be an unused value greater
+   than ``DLDeviceType::kDLExtDev``, but less than
+   ``DeviceAPIManager::kMaxDeviceAPI``.
+
+#. Add a case in ``DeviceName`` in `device_api.h`_ to convert from the
+   enum value to a string representation.  This string representation
+   should match the name given to ``TVM_REGISTER_GLOBAL``.
+
+#. Add entries to the ``MASK2STR`` and ``STR2MASK`` dictionaries of
+   :py:class:`tvm.runtime.Device` for the new enum value.
+
+
+.. _tvm-target-specific-target:
+
+Target Definition
+-----------------
+
+The ``Target`` object is a lookup table of properties about a physical
+device, its hardware/driver limits, and its capabilities.  The
+``Target`` is accessible both during optimization and code generation
+stages.  While the same ``Target`` class is used for all runtime
+targets, each runtime target may need to add target-specific options.
+
+.. _target_kind.cc: https://github.com/apache/tvm/blob/main/src/target/target_kind.cc
+
+In `target_kind.cc`_, add a new declaration of
+``TVM_REGISTER_TARGET_KIND``, passing a string name of the new target,
+and the ``TVMDeviceExtType`` or ``DLDeviceType`` enum value for the
+device on which that target should run.  Typically, the target name
+and the device name will match.  (e.g. The ``"cuda"`` target runs on
+the ``kDLCUDA`` device.)  There are exceptions, such as when multiple
+different code generation targets can run on the same physical device.
+(e.g. The ``"llvm"`` and ``"c"`` targets both run on the ``kDLCPU``
+device type.)
+
+All options for a specific target kind are added with the
+``add_attr_option`` function, with optional default values.  A
+preprocessor can be added with ``set_attrs_preprocessor`` to define
+any parameters that are dynamically based on other parameters or
+queried from device properties.
+
+This argument definition defines a parser that can unpack a string
+description of a target.  This is done in the ``Target::Target(const
+String&)`` constructor in C++, which accepts a JSON-formatted string
+and is typically called using the :py:class:`tvm.target.Target` python
+object.  For example, ``tvm.target.Target('{"kind": "cuda",
+"max_num_threads": 1024}')`` will create a ``cuda`` target, while
+overriding the default maximum number of threads.
+
+In a code generator, the target properties can be accessed using
+``target->GetAttr<T>(param_name)`` in C++, or with the
+``target.attrs`` dictionary in Python.
+
+
+.. _tvm-target-specific-codegen:
+
+Target Code Generators
+----------------------
+
+The code generators take an optimized ``IRModule`` and converts it
+into an executable representation.  Each code generator must be
+registered in order to be used by the TVM framework.  This is done by
+registering a function named ``"target.build.foo"``, where ``foo`` is
+the same name as was used in the ``TVM_REGISTER_TARGET_KIND``
+definition above. ::
+
+  tvm::runtime::Module GeneratorFooCode(IRModule mod, Target target);
+  TVM_REGISTER_GLOBAL("target.build.foo").set_body_typed(GeneratorFooCode);
+
+The code generator takes two arguments.  The first is the ``IRModule``
+to compile, and the second is the ``Target`` that describes the device
+on which the code should run.  Because the environment performing the
+compilation is not necessarily the same as the environment that will
+be executing the code, code generators should not perform any
+attribute lookups on the device itself, and should instead access
+parameters stored in the ``Target``.
+
+Each function in the input ``IRModule`` should be accessible by name
+in the output ``runtime::Module``.
diff --git a/docs/dev/index.rst b/docs/dev/index.rst
index 7eeecc1..339cfbc 100644
--- a/docs/dev/index.rst
+++ b/docs/dev/index.rst
@@ -24,9 +24,15 @@ This page is organized as follows:
 
 - The `Example Compilation Flow`_ gives an overview of the steps that TVM takes to turn a high level description of a model into a deployable module.
   To get started, please read this section first.
+
 - The `Logical Architecture Components`_ section describes the logical components.
   The sections after are specific guides focused on each logical component, organized
   by the component's name.
+
+- The `Device/Target Interactions`_ section describes how TVM
+  interacts with each supported physical device and code-generation
+  target.
+
 - Feel free to also check out the :ref:`dev-how-to` for useful development tips.
 
 This guide provides a few complementary views of the architecture.
diff --git a/docs/dev/runtime.rst b/docs/dev/runtime.rst
index fc03ed8..bc1b035 100644
--- a/docs/dev/runtime.rst
+++ b/docs/dev/runtime.rst
@@ -42,8 +42,11 @@ We also want the runtime core to be minimal to deploy to embedded devices.
 PackedFunc
 ----------
 
-`PackedFunc`_ is a simple but elegant solution
-we find to solve the challenges listed. The following code block provides an example in C++
+`PackedFunc`_ is a simple but elegant solution we find to solve the
+challenges listed.  A single ``PackedFunc`` object represents a
+function call whose caller and callee may be in different languages.
+
+The following code block provides an example in C++
 
 .. _PackedFunc: https://github.com/apache/tvm/blob/main/include/tvm/runtime/packed_func.h
 
@@ -147,6 +150,8 @@ The overhead of calling into PackedFunc vs. a normal function is small, as it is
 So it is OK as long as we don't wrap small functions.
 In summary, the PackedFunc is the universal glue in TVM where we use it extensively to support our compiler and deployment.
 
+.. _tvm-runtime-system-module:
+
 Module
 ------