You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/10/22 11:14:21 UTC
[GitHub] [incubator-tvm] u99127 commented on pull request #6703: [µTVM] Add virtual machine, test zephyr runtime on real hardware

u99127 commented on pull request #6703:
URL: https://github.com/apache/incubator-tvm/pull/6703#issuecomment-714421195


   hey Andrew,
   
   Thanks a lot for your reply. 
   
   > 
   > 
   > @u99127 some early replies.
   
   
   
   > 
   > > I've spent sometime today attempting to review this PR. I would like to understand more about how this pushes things forward with the uTVM integration. Can you link this to an RFC that describes this for posterity and how this links with the overall plan for uTVM ?
   > > sure, happy to write one. briefly: this makes it easy to use Zephyr, so it generally aligns with [Standalone µTVM roadmap](https://discuss.tvm.apache.org/t/rfc-tvm-standalone-tvm-roadmap/6987), though happy to admit an RFC with an added level of detail would help.
   > 
   > > At the end of this PR getting merged, is the expectation that a user of TVM gets a shell like interface to interact with a board and be able to interactively run machine learning models on the board using Zephyr and uTVM (well, really autotuning at this point of time) ?
   > 
   > correct. Zephyr has a lot of dependencies, and additionally, setting up the TVM virtualenv is not trivial now and is made especially hard by needing to install the additional zephyr dependencies. the VM solves this in a cross-platform way. it's needed (vs a docker container) because docker can't correctly virtualize any piece of attached hardware which can't be controlled solely with libusb (I.e. virtual com ports can't be used from docker containers).
   > 
   > > This appears to mix multiple changes which could be staged into one giant PR.
   > > I can see the following changes here .
   > > 
   > > 1. New CI infrastructure with vagrant boxes,
   > 
   > I don't believe we're adding any CI infrastructure here. The [previous change](https://github.com/apache/incubator-tvm/pull/6603) added QEMU support to the CI plus the current test. This PR proves that the test can be used with both QEMU and real hardware with a minor 2-line change (this is why this PR adds parameterization to `test_zephyr`).
   > 
   > > 1. changes to existing uTVM transport server protocols
   > 
   > No protocol level changes were made here, but I did add the `SerialTransport` implementation and a way to automatically launch GDB to debug the attached Cortex board. I think more documentation would make this clear. I can add a doc explaining how to use this change.
   > 
   
   I think this is worth splitting out into its own PR since it seems independent ? The smaller the PR the easier it is for review.  
   
   
   > > 1. changes to llvm's target parsing which is really common between the Arm and AArch64 targets AFAIK and the behaviour of mfpu is very different between the 2. Thus that can be an indpendent PR in its own right.
   > 
   > actually I can split this out. originally I was thinking that `test_zephyr` required floating point support, but it uses integers.
   
   Please can we split it out. Unrelated changes in the same PR make review difficult and painful and mean things slip with review and is harder on the reviewer.
   
   > 
   > > 4 ... I'm sure I've missed some changes relating to Zephyr or others in there.
   > 
   > A couple of bugfixes:
   > 
   >     * if an Exception is raised while constructing the Session but after the transport has been initialized, de-initialize the transport
   > 
   >     * fix behavior of MicroSession when using it concurrently with an attached debugger. Specifically, if no retry timeout is specified, StartSession would still timeout after the retry timeout.
   
    Can we please pull it out ? One PR per bug fix is really what I would suggest.
   
   > 
   > 
   > > I'm not going to be able to review items 2, 3 and 4 on this list tonight and will be restraining myself to just looking at the vagrant bits and understand more about how this looks.
   > > My understanding of the choice of vagrant is that this is useful as a different way of connecting to peripherals which is hard to manage in a docker container . Is that right ?
   > 
   > Vagrant is mainly useful to codify the virtual machine configuration. you're right that we use the VM to connect to peripherals because docker isn't any good for that. there are 2 virtual machine configs stored: a "base box" which is stored [here](https://app.vagrantup.com/areusch/boxes/microtvm-staging) (we need to find a better home for it), and a "vm" which is what the user's supposed to use. the base box just needs to be updated like the CI docker containers: when dependencies change. the vm is built by every user by issuing:
   > `$ vagrant up --provider={parallels,virtualbox,...}`
   > in the `apps/microtvm-vm/vm` directory.
   > 
   > this builds the user's TVM installation using the VM's gcc toolchain.
   
   
   I'll need to read through this again but it won't be till later tonight.
   
   > 
   > > With the vagrant box setup what is the thinking with respect to those developers or those developers using CI that runs already inside virtual machines. Would this naturally nest and work out of the box ? For instance if there as a developer wanting to develop uTVM on a GNU/Linux VM on a Windows laptop, how would this set up work for them ? Or indeed if one were to run CI systems that used openstack, would vagrant play nicely on those CI systems ?
   > > Right now the VM doesn't have any relation to the CI. You could imagine a future where part of the CI is spinning up the VM and launching a Jenkins instance. Instead, we build a docker image with QEMU and Zephyr, but no utilities for connecting to peripherals.
   > 
   > I believe it should be possible to launch the CI from inside the VM if you installed docker--we configure this now. Vagrant is supported on Windows and it should be possible to use it with VirtualBox to develop for µTVM. I haven't tested that, though. I don't know if the VM can nest on openstack, but as we don't plan to use it in CI yet, I don't think this adds any additional concerns to the problem of running the TVM CI on openstack. Let me know if I've misunderstood your question.
   > 
   
   We do internally run our TVM CI on openstack or other VM's for development. So the deployment flow here would certainly need to be thought through. I'm not sure about other organizations and how they organize their CI but we've certainly tried to use the building blocks from upstream CI launching stuff off on internal workers that are either VM guests or machines.
   
   @leandron should certainly look at this too.
   
   > > A first pass through this to see what it looks like but I'm not terribly happy with my own understanding of this .
   > > Finally , are there plans for a tutorial in the source base about how to actually use this since this PR seems to be short on any documentation about folks wanting to try it out. How can a developer try this out ?
   > 
   > Yes, definitely. I have one nearly ready--I should just merge it with this PR. I'll iterate on that.
   
   I think this should be split up into atleast 4 or 5 PRs to manage?
   
   1. Vagrant box changes with the tutorial as I think this is the biggest item here.
   2. GDB changes. 
   3. LLVM changes. 
   4.  Zephyr changes
   5. The 2 bug fixes you mention. 
   6. Anything else that I've missed.
   
   regards
   Ramana
   
   > 
   > > regards
   > > Ramana
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org