You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tvm.apache.org by James Gilles <no...@github.com> on 2019/04/08 01:31:45 UTC

[dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

(Moved from #2981)

Proposal: add type signature metadata to PackedFunc

 Each `PackedFunc` would have an optional field describing what arguments it takes and what type it returns. This field would be automatically populated during conversion from a `TypedPackedFunc`.

Once we have these type signatures, we could use them to generate bindings in new languages (i.e., Rust) and get coverage of a good chunk of TVM's API for free.

This would result in much easier-to-maintain language bindings (not only for the runtime, for the compiler stack too) and might allow us to eventually rip out a lot of the manually-written stuff in e.g. the Python bindings.

The downside of this idea is that it would result in some fairly hairy codegen, which can be difficult to maintain. It would also add small amounts of overhead to the tvm runtime system; we could reduce that issue by adding a #define to disable the type metadata. Also, as @tqchen pointed out, the generated code might be lower-quality than handwritten bindings.

A few extensions of this idea:
- Add more compile-time metadata to the Node heirarchy, allowing codegen to access their methods / attributes.
- Add docstrings to PackedFuncs to allow auto-generation of documentation.
- Allow `std::variant` or equivalent in `TypedPackedFunc` signatures. Lots of PackedFuncs have arguments that can be one of a few types (e.g. Int or Expr); a simple extension to the PackedFunc system + runtime type system would allow these to be described automatically. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by Tianqi Chen <no...@github.com>.
One thing I would like to mention is that manual wrapping has its merit :)

Take the following wrapper code for scan as an example. By manually wrapping it in python, we can benefit from:
- A clear signature that users can look up
- Keyword argument and default value
- Python-specific documents, in particular, the type signature and most importantly code examples.

```python
def scan(init, update, state_placeholder, inputs=None, name="scan", tag="", attrs=None):
    """Construct new tensors by scanning over axis.

    Parameters
    ----------
    init: Tensor or list of Tensor
        The initial condition of first init.shape[0] timestamps
    update: Tensor or list of Tensor
        The update rule of the scan given by symbolic tensor.
    state_placeholder: Tensor or list of Tensor
        The placeholder variables used by update.
    inputs: Tensor or list of Tensor, optional
        The list of inputs to the scan. This is not required, but can
        be useful for the compiler to detect scan body faster.
    name: str, optional
        The name hint of the tensor
    tag: str, optional
        Additonal tag information about the compute.
    attrs: dict, optional
        The additional auxiliary attributes about the compute.
    Returns
    -------
    tensor: Tensor or list of Tensors
        The created tensor or tuple of tensors it it contains multiple outputs.
    Example
    -------
    .. code-block:: python
      # The following code is equivalent to numpy.cumsum
      m = tvm.var("m")
      n = tvm.var("n")
      X = tvm.placeholder((m, n), name="X")
      s_state = tvm.placeholder((m, n))
      s_init = tvm.compute((1, n), lambda _, i: X[0, i])
      s_update = tvm.compute((m, n), lambda t, i: s_state[t-1, i] + X[t, i])
      res = tvm.scan(s_init, s_update, s_state, X)
```

While it is possible to have a system that codegen these components, that might mean we have to write the python example documents directly in somewhere else, which is less natural.

Of course, one drawback of doing the manual wrapping is that such wrapper has to be created for each language.  This may not be a bad thing, especially we want to think clearly about the language specific features, and write good documents.

I do think there is some merit to do have good metadata for the node system and automatically generate some of the accessors.



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-480655160

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by James Gilles <no...@github.com>.
From other thread:

@tqchen:
> I will summarize some of my take here. I like the idea of Node hierarchy compile time generation. This is something I have thought about and discussed with @jroesch for a while and might help [#2523 (comment)](https://github.com/dmlc/tvm/issues/2523#issuecomment-458821056)
> 
> It is always tempting to automate more parts of wrapper generation. However, our past experiences suggest that the automatic wrapper generation is never perfect. Think about how can we support keyword arguments, good pythonic style docstring and so on. It is also harder for developers to find the actual implementation of the "generated API" since some of that is generated at runtime. Eventually, we find that it is simpler to just do a manual wrapping, which gives us all the good native features, docs, and keep PackedFunc simple (by only support positional arguments without any meta-data).

@nhynes:
> This idea actually comes a lot :P [#2328 (comment)](https://github.com/dmlc/tvm/pull/2328#issuecomment-450001679)
> 
> I know, for sure, that we could get good docs with _really good_ codegen like that offered by Rust macros, but I also know for sure that we're not about to rewrite TVM in Rust :)
> 
> I think that the boilerplate really does bother new (advanced) users who want to use TVM as a tool. I wonder if there's a way forward here that satisfies all desiderata?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-480655648

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by James Gilles <no...@github.com>.
I'm starting work in https://github.com/dmlc/HalideIR/pull/56

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-482346276

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by James Gilles <no...@github.com>.
Oh hm, that's an interesting idea. How would we handle inheritance / methods on nodes? Just copy-pasting stuff in the generator, or some sort of runtime lookup system implemented in C?

Another route to ABI compatibility is to just use opaque pointers + accessors everywhere, that might be a less invasive change. It does impose more overhead tho.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-480901935

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by James Gilles <no...@github.com>.
How does the refcounting for Nodes currently work across the ABI boundary? Is everything owned by the TVM runtime or something?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-481045275

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by James Gilles <no...@github.com>.
`NodeRef` is just a type-erased version of `NodePtr<T>` right?

I think the main question is if we want to allow nodes to be implemented in languages besides C++. If we require nodes to be implemented as `Node` subclasses, then they have to have C++ impls for inheritance to work. If we design a `Node` C API that doesn't have this requirement, then we can mix-and-match Node implementations between languages. That's really only useful for adding temporary node types for e.g. passes implemented in python though.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-482296953

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by Tianqi Chen <no...@github.com>.
RE: Node hierarchy. One ultimate direction is to use a DSL(for example python with typing info) as a way of declaring the node relations and fields and create generator util to generate the classes. The main challenge is to keep most of the data structures purely in C(so we can get ABI compatibility across languages).  We do also want to keep some of the node base class and the possibility to declare node inside c++ for some private temp node. Which does not enjoy the cross-lang features but can make easy use of any language-specific data structure.




-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-480687154

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by James Gilles <no...@github.com>.
I could set it up so that most of the node generation logic + code is in HalideIR, and then there's a generator script each set of nodes needed (one here, one there.) I might open a PR over there and start work.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-482275032

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by James Gilles <no...@github.com>.
@tqchen, yeah those are good points. One possibility would be to use hand-written languages for first-tier-supported languages like python and offer the auto-generated wrappers for other languages.

If you did end up switching fully to auto-generated wrappers you could take steps like committing generated code to the git repo to make it easier to browse, with doc comments leading back to the original source code. Ultimately, though, it'll always be less natural than hand-written wrappers, that's definitely true.

RE: the Node heirarchy, it seems like doing more with that is an unambiguous win. I could imagine setting up a system like the `TVM_REGISTER_GLOBAL` macro + maybe macros for declaring attributes so that you don't have to hand-code visitAttrs. Did you have other thoughts on that design?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-480657099

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by Tianqi Chen <no...@github.com>.
Closed #2983.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#event-2464106180

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by Tianqi Chen <no...@github.com>.
We will likely to give up methods for most cases and use functions/vtable instead. This could be fine for common IR structures. This also seems to be the case for certain languages. methods can still be used if we manually declare the node in c++

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-480914989

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by Junru Shao <no...@github.com>.
I feel like `Node`, `NodeRef` and `NodePtr` are pretty much general-purposed refcounted objects, rather than specific nodes in the IR. They could be shared across ABI boundary using PackedFunc out-of-box. It will be interesting to see if we could inherit every object from `Node` so that everything automatically gets the ability to be shared across different languages.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-482278143

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by Tianqi Chen <no...@github.com>.
https://github.com/dmlc/tvm/issues/3501

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-508953907

Re: [dmlc/tvm] [RFC] More PackedFunc metadata (#2983)

Posted by Tianqi Chen <no...@github.com>.
Things should work out of box, as long as each side have the same way to access the ref counter. We increase/decrease ref counters on each language and that ref counter is global to all the languages. Because node has a deleter that is populated during creation, you can safely run the deletion depending on the language-specific behavior that creates the node

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2983#issuecomment-481045962