You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/11/25 17:24:38 UTC

[GitHub] [tvm] vegaluisjose commented on pull request #6971: [Hardware][Verilator] Integrating and simulating hardware accelerators in TVM

vegaluisjose commented on pull request #6971:
URL: https://github.com/apache/tvm/pull/6971#issuecomment-733843930


   > Hi @vegaluisjose ,
   > 
   > This looks very interesting! I would suggest to move the RFC discussion in https://discuss.tvm.apache.org/. Meanwhile, let me double check I understood the design. So, the idea is:
   > a) Write Verilog
   > b) Compile (e.g., with Verilator) in a C++ Cycle Accurate Model
   > c) Link that model into TVM
   > You are using BYOC so that you can split your graph on the operations you want, and offload the one you decided to support (potentially, the entire graph) to the C++ model. Is my understanding correct?
   > 
   > Two questions:
   > a) How hard is to open a route in TIR for this (to test smaller hw parts)? In theory `call_extern` should be sufficient?
   > b) Why adding `hw-widgets` as a dependency? IIUC this is just an example, right?
   
   Hi @giuseros ,
   
   Yes to all of your points (a, b, c). This is an extension of a work we started exploring earlier this [year](https://homes.cs.washington.edu/~vegaluis/pdf/ieeemicro20_vega_lastlayer.pdf). The main difference between the paper and this RFC is the fact that we are allowing the hardware engineer to specify the level of detail of the simulating object. For example, if you have hardware code for simulating complex protocols i.e., AXI, then you could implement such protocol transactions using the proposed device interface. On the other hand, if you don't have such models or you want to bypass them to speed simulation then you can use hierarchical-paths to access memories or registers in your design, and control it the same way I am doing it in the current demo. The benefit of this *freedom* is that the interoperability shouldn't depend on your protocol of choice, you can interact with your design in any way that is more productive for your use case.
   
   In terms of the other questions,
   
   a) I haven't explored the TIR path yet. However, the benefit of doing such thing is leveraging other utilities available in the framework i.e., AutoTVM. In other words, BYOC could be considered a more shallow integration, but still useful.
   
   b) The reason why we have the `hw-widgets` repo is that this stuff is hard to explain without a running example given that we are in a hardware-software boundary. I am working on a blogpost that will come after this gets merged that not only covers all of this but also shows how to offload operators from ML models. For example, we have already tested offloading `nn.bias_add()` to the same *scalar adder*, used in this example, from a quantized MobileNet model. As @liangfu said, this is a starting point for others.
   
   Finally, this approach does not stop at ML accelerators only, you can imagine integrating a bare-metal processor out there in a similar fashion. There are plenty of RISCV processors out there using Verilator for simulation, so that could happen also. I am giving a small talk in TVM conf about this RFC.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org