You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/09/20 20:52:55 UTC

[GitHub] [tvm] areusch edited a comment on issue #9022: [Bug] BuiltinLower has hard-coded heuristic for alloca which is not appropriate for all kDLCPU target devices

areusch edited a comment on issue #9022:
URL: https://github.com/apache/tvm/issues/9022#issuecomment-923290583


   apologies for the delay in reply here.
   
   I agree that allocating on the stack is, in the current GraphExecutor world, a codegen-level optimization. In my mind, the introduction of AOTExecutor (and actually also VMExecutor) means that this is moving one level up. Here is why:
   - previously, the maximum effective call depth of a TIR function was 1, meaning that codegen could decide *then and there* that it made sense to move a particular local `tir.allocate` node onto the stack.
   - with the introduction of AOT and control flow VMExecutor, such `tir.allocate` may now live while another TIR function is being invoked e.g. with `tir.call_packed`.
   
   Now that this is the case, it doesn't make sense to make stack-allocation of `tir.allocate` a codegen-level optimization. It must be done prior to codegen using graph-level memory analysis e.g. USMP.
   
   Then question then is how to model this in the program. I think using `storage_scope` does make sense. The definition I've been working with of a `storage_scope` is:
   > A contiguous region of memory, potentially with a maximum size, which can be used by operator implementations on a set of `tvm.Device`.
   
   Under this definition, a special `<context>.stack` `storage_scope` could be created with a target-specific bound on the amount of memory available. I do not think we should create a default `global` scope anywhere, as it's unlikely a global scope is truly ever global except in the case of single-CPU homogenous model execution. We just haven't modeled this yet in TVM, as we don't have an explicit mapping indicating which storage_scope are accessible by which device. The closest thing to a global scope I can think of would be something like `executor` scope, which could be used to place `DLTensor` used by the AOT executor for intermediate tensors.
   
   Something we still have yet to address is how to configure the `storage_scope` made available for a particular Target. I think the Target-specific attributes may not quite cut it, as `llvm` obviously would have different amounts of stack RAM available based on the selected CPU. And, the user will need to be able to override this as well, since sometimes linking with different software libraries reduces the stack RAM available (e.g. some libraries require more global memory and therefore take away from that RAM available for stack usage; or other times, some libraries require massive interrupt stacks and a larger budget must be reserved to mitigate the risk of overwriting global memory due to an interrupt).
   
   It's likely that a scalable solution in micro-land is to have the Project API server able to provide a boilerplate memory layout in an e.g. `server_info_query` call; and then it's likely we will need to have a way for users to override this.
   
   Lastly, as to the `<context>` prefix mentioned above: there are many cases when multiple stacks exist on a device:
   - multi-core execution
   - parallel thread execution (e.g. stack viewed as a sort of thread-local storage)
   
   in the former case, it's likely that `<context>` should be replaced with the `name` given to the TargetDevice e.g. in https://github.com/apache/tvm/pull/8892. For example `dsp.stack` or `always-on-cpu.stack`.
   
   in the latter case, we probably additionally need a thread identifier e.g. `dsp.thread0.stack`.
   
   Until we have USMP, my thoughts are that the short-term solution should be to stick with a codegen-level optimization and add an attribute which can be used to disable the stack-allocation optimization. What do you think @tqchen @manupa-arm ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org