You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/09/17 13:10:58 UTC

[GitHub] [tvm] manupa-arm edited a comment on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

manupa-arm edited a comment on issue #9022:
URL: https://github.com/apache/tvm/issues/9022#issuecomment-921782373


   @tqchen ,
   
   > To say it in another way, we cannot say that "global" definitely mean no stack allocation.
   
   The current issue is in the device "CPU" && 'global' certainly means its definitely stack allocation if its less that heuristic size and not the other way around.
   
    > If the code needs to impose additional constraint that the memory must be accessible from a separate device(e.g. NPU), it certainly would require a more specialized constraint that is better spelled out explicitly.
   
   > As we can see that this is another kind of flexibility we want to enable here -- flexibility of picking possible backend allocation implementations without over constraining the code generator to a backend specific behavior that is platform dependent (like the case of pinned memory
   
   Yes this is something we want eventually and we will be working towards achieving with USMP work.
   
   Until we have that, the natural assumption should be in absense of a 'constraint' that the memories are more accessible rather than being less accessible (e.g. stack). Its unfortunate that the current design prefers the latter especially in a absense of a constraint.
   
   @mbs-octoml ,
   
   ### Short term solution :
   I think you are right, we might want to unblock this using a target-dependent kMaxStackAllocaSize. 
   
   May I ask why  was the default chosen to be this ?
   https://github.com/apache/tvm/blob/1fd8f610953adc39cbd18d82f4a9e92a11575dfc/include/tvm/runtime/device_api.h#L60-L61
   
   Its interesting because the stack size go up beyond that size as it is just looking on a single allocate at a time. i.e. you could have multiple allocates that are less than < 1024. So the stack usage is not even bounded by the current approach.
   
   Therefore, to both unlock us with Ethos-U and also somewhat solve the problem that current micro builds using stack for tensors < 1024 instead of the workspace buffer provided, maybe we should just make kMaxStackAllocaSize=0 (a binary decision rather than a value range).
   
   @Mousius @leandron @areusch , this means there is going to be another argument for a simple micro deployment to be added to the already long list of arguments. Something like "--use-external-workspace" ? 
   
   @tqchen , I still feel it would have been super helpful that kMaxStackAllocaSize is by default zero but with the option of going higher based on a user argument. e.g. --max-stack-alloca-size=1024. It is not very convincing that we are leaving out stack allocation of tensors with the prospect of being optimized by mem2reg without doing any marking (i.e. special storage scope).
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org