You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/09/01 09:36:59 UTC

[GitHub] [tvm-rfcs] manupa-arm commented on a change in pull request #22: [RFC][TIR] TIR Non-scalar Constants

manupa-arm commented on a change in pull request #22:
URL: https://github.com/apache/tvm-rfcs/pull/22#discussion_r699187806



##########
File path: rfcs/0022-tir-non-scalar-constants.md
##########
@@ -0,0 +1,107 @@
+
+- Feature Name: tir_non_scalar_constants
+- Start Date: 2021-06-01
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/22
+- GitHub Issue: TBD
+
+# 1. Summary
+
+This RFC proposes how non-scalar constants could be represented in TIR and used by passes in the lowering process.
+
+# 2. Motivation 
+
+Currently, the non-scalar constants could be represented in Relay (relay.Constant) to be used by relay passes but not in TIR. Therefore, when performing lowering using TIR passes, we have to maintain a side-channel of tir::Var to constant non-scalar data mapping to perform transformations that could use the knowledge where some of the data are constants.
+
+Few example scenarios as further motivation :
+
+## Weight compression
+
+When lowering for accelerators (E.g. : [Arm(R) Ethos(TM)-U NPU](https://github.com/apache/tvm-rfcs/pull/11)), certain operations will need to get tiled to co-optimize performance and memory utilization. Such tiling patterns create slices of weights that need compressing that will end up with varying sizes. Therefore, the knowledge of some tir::Vars refer to constants are critical in the level of TIR to perform this.

Review comment:
       @junrushao1994 , the values do matter, as with compression the entropy of the constant data information determines the size post-compression.

##########
File path: rfcs/0022-tir-non-scalar-constants.md
##########
@@ -0,0 +1,107 @@
+
+- Feature Name: tir_non_scalar_constants
+- Start Date: 2021-06-01
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/22
+- GitHub Issue: TBD
+
+# 1. Summary
+
+This RFC proposes how non-scalar constants could be represented in TIR and used by passes in the lowering process.
+
+# 2. Motivation 
+
+Currently, the non-scalar constants could be represented in Relay (relay.Constant) to be used by relay passes but not in TIR. Therefore, when performing lowering using TIR passes, we have to maintain a side-channel of tir::Var to constant non-scalar data mapping to perform transformations that could use the knowledge where some of the data are constants.
+
+Few example scenarios as further motivation :
+
+## Weight compression
+
+When lowering for accelerators (E.g. : [Arm(R) Ethos(TM)-U NPU](https://github.com/apache/tvm-rfcs/pull/11)), certain operations will need to get tiled to co-optimize performance and memory utilization. Such tiling patterns create slices of weights that need compressing that will end up with varying sizes. Therefore, the knowledge of some tir::Vars refer to constants are critical in the level of TIR to perform this.
+
+## Memory Planning
+
+The TIR program has the ability to express both inter and intra operator memory requirement, post-scheduling as explained further by [Unified Static Memory Planning RFC](https://github.com/apache/tvm-rfcs/pull/9). It would be better if the constants could be embedded to the TIR PrimFunc. Moreover, this allows various [target-dependent lowerings](https://github.com/apache/tvm-rfcs/pull/10), to produce TIR PrimFuncs with constants in it.

Review comment:
       As I've replied before, the values determines the size in the case of the compression. 
   
   Moreover, the knowledge of the values of constants (as opposed to holding a de-referencing map to get values) will be beneficial to writing lowering passes with maximum access to knowledge of the program/operator.
   
   May I ask the benefits of hiding the constant values for TIR passes?

##########
File path: rfcs/0022-tir-non-scalar-constants.md
##########
@@ -0,0 +1,107 @@
+
+- Feature Name: tir_non_scalar_constants
+- Start Date: 2021-06-01
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/22
+- GitHub Issue: TBD
+
+# 1. Summary
+
+This RFC proposes how non-scalar constants could be represented in TIR and used by passes in the lowering process.
+
+# 2. Motivation 
+
+Currently, the non-scalar constants could be represented in Relay (relay.Constant) to be used by relay passes but not in TIR. Therefore, when performing lowering using TIR passes, we have to maintain a side-channel of tir::Var to constant non-scalar data mapping to perform transformations that could use the knowledge where some of the data are constants.
+
+Few example scenarios as further motivation :
+
+## Weight compression
+
+When lowering for accelerators (E.g. : [Arm(R) Ethos(TM)-U NPU](https://github.com/apache/tvm-rfcs/pull/11)), certain operations will need to get tiled to co-optimize performance and memory utilization. Such tiling patterns create slices of weights that need compressing that will end up with varying sizes. Therefore, the knowledge of some tir::Vars refer to constants are critical in the level of TIR to perform this.
+
+## Memory Planning
+
+The TIR program has the ability to express both inter and intra operator memory requirement, post-scheduling as explained further by [Unified Static Memory Planning RFC](https://github.com/apache/tvm-rfcs/pull/9). It would be better if the constants could be embedded to the TIR PrimFunc. Moreover, this allows various [target-dependent lowerings](https://github.com/apache/tvm-rfcs/pull/10), to produce TIR PrimFuncs with constants in it.
+
+## Winograd Constants
+
+The Winograd transformation (used for fast GEMMs) involves multiplication by a hard-coded constant tensor. This is currently accomplished in TE using a complicated TE compute expression with many nested selects. Being able to directly express a constant tensor here would significantly simplify this code.
+
+
+# 3. Guide-level explanation
+
+This is not particularly a user-facing feature and this will allow constants to be 'linked' to TIR. Initially, we are planning to use this with gated on '-link-params' argument for relay.build and TVMC.
+
+# 4. Reference-level explanation
+
+The proposal is quite simple and it could be explained as follows :
+
+```
+@tvm.script.tir
+def myfunc():   
+   param = tir.allocate_const([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "int32", [10])
+```
+
+This follows closely the semantics of tir.allocate and the difference being it represent a buffer filled with constants.

Review comment:
       I dont think the where constants are allocated need to be decided here.
   The proposal here, is to represent constants intimately in the TIR primfunc, that could be lowered in a way each target wants them lowered (if they support it).
   
   > Will the constants be allocated on stack or on heap?
   
   So currently where we want to 'link' in the parameters, they will generated part of the runtime.Module and linked via the executor : https://github.com/apache/tvm/pull/6917/files.
   For CPU targets, they will go to a .rodata section, where the constants are held to keep compatibility with linked param code generation of what exists today.
   
   Not sure we want them in stack nor the heap, however, we might want them in different sections if the system has more non-volatile memories.
   
   >  Is this designed for small matrices (e.g. the small matrix in winograd), or relatively larger matrices (e.g. the weight that needs prefetching)?
   
   This is size agnostic, therefore I'd not expect a difference here.
   
   > How will lowering and code generation be affected?
   
   This will only be supported (at least initially and agreed with @tqchen ) with targets that support code generations for constants (currently it uses link-params target argument). Therefore, if a target have this capability (and enabled in a given compilation flow), we go with assumption the target knows how to generate code for the constants to be used by the operator.
   
   > Does it work for GPU and other devices?
   
   I dont think GPU is a target that support (currently) code generation for constants, therefore the constants will live it in the tvm_main function (as relay.Constants).
   
   > How does it affect linkers' job?
   
   So there are mainly two ways linkers' job could be affected, AFAIK.
   
   1.) If the code generation for constants is supported by the respective target, we'll assume the code will be generated with appropriate sections (if they are C-like) or consumed in any other artifact that expect the constants to be embedded. If the target support neither, then that target is not a target that requires constant 'link'ed to the TIR.
   
   2,) if the USMP is invoked, in which case it will pool all the constants and pulled out of tvm_main and exposed to application layer. 
   
   For basic and most usecases, the constant pools will be generated in the metadata module (See U1 of https://github.com/apache/tvm-rfcs/blob/c520a3912279bcd3c432dafd4070661f614882cf/rfcs/0009_Unified_Static_Memory_Planning.md).
   
   For certain use-cases (See U3 of https://github.com/apache/tvm-rfcs/blob/c520a3912279bcd3c432dafd4070661f614882cf/rfcs/0009_Unified_Static_Memory_Planning.md), this would be where the user writes the application. 
   
   
   
   
   
   
   
   
   
   
   

##########
File path: rfcs/0022-tir-non-scalar-constants.md
##########
@@ -0,0 +1,107 @@
+
+- Feature Name: tir_non_scalar_constants
+- Start Date: 2021-06-01
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/22
+- GitHub Issue: TBD
+
+# 1. Summary
+
+This RFC proposes how non-scalar constants could be represented in TIR and used by passes in the lowering process.
+
+# 2. Motivation 
+
+Currently, the non-scalar constants could be represented in Relay (relay.Constant) to be used by relay passes but not in TIR. Therefore, when performing lowering using TIR passes, we have to maintain a side-channel of tir::Var to constant non-scalar data mapping to perform transformations that could use the knowledge where some of the data are constants.
+
+Few example scenarios as further motivation :
+
+## Weight compression
+
+When lowering for accelerators (E.g. : [Arm(R) Ethos(TM)-U NPU](https://github.com/apache/tvm-rfcs/pull/11)), certain operations will need to get tiled to co-optimize performance and memory utilization. Such tiling patterns create slices of weights that need compressing that will end up with varying sizes. Therefore, the knowledge of some tir::Vars refer to constants are critical in the level of TIR to perform this.
+
+## Memory Planning
+
+The TIR program has the ability to express both inter and intra operator memory requirement, post-scheduling as explained further by [Unified Static Memory Planning RFC](https://github.com/apache/tvm-rfcs/pull/9). It would be better if the constants could be embedded to the TIR PrimFunc. Moreover, this allows various [target-dependent lowerings](https://github.com/apache/tvm-rfcs/pull/10), to produce TIR PrimFuncs with constants in it.
+
+## Winograd Constants
+
+The Winograd transformation (used for fast GEMMs) involves multiplication by a hard-coded constant tensor. This is currently accomplished in TE using a complicated TE compute expression with many nested selects. Being able to directly express a constant tensor here would significantly simplify this code.
+
+
+# 3. Guide-level explanation
+
+This is not particularly a user-facing feature and this will allow constants to be 'linked' to TIR. Initially, we are planning to use this with gated on '-link-params' argument for relay.build and TVMC.
+
+# 4. Reference-level explanation
+
+The proposal is quite simple and it could be explained as follows :
+
+```
+@tvm.script.tir
+def myfunc():   
+   param = tir.allocate_const([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "int32", [10])
+```
+
+This follows closely the semantics of tir.allocate and the difference being it represent a buffer filled with constants.

Review comment:
       I dont think where constants are allocated need to be decided here based on how decide to represent constants in TIR.
   
   The proposal here, is to represent constants intimately in the TIR primfunc, that could be lowered in a way each target wants them lowered (if they support it).
   
   > Will the constants be allocated on stack or on heap?
   
   So currently where we want to 'link' in the parameters, they will generated part of the runtime.Module and linked via the executor : https://github.com/apache/tvm/pull/6917/files.
   For CPU targets, they will go to a .rodata section, where the constants are held to keep compatibility with linked param code generation of what exists today.
   
   Not sure we want them in stack nor the heap, however, we might want them in different sections if the system has more non-volatile memories.
   
   >  Is this designed for small matrices (e.g. the small matrix in winograd), or relatively larger matrices (e.g. the weight that needs prefetching)?
   
   This is size agnostic, therefore I'd not expect a difference here.
   
   > How will lowering and code generation be affected?
   
   This will only be supported (at least initially and agreed with @tqchen ) with targets that support code generations for constants (currently it uses link-params target argument). Therefore, if a target have this capability (and enabled in a given compilation flow), we go with assumption the target knows how to generate code for the constants to be used by the operator.
   
   > Does it work for GPU and other devices?
   
   I dont think GPU is a target that support (currently) code generation for constants, therefore the constants will live it in the tvm_main function (as relay.Constants).
   
   > How does it affect linkers' job?
   
   So there are mainly two ways linkers' job could be affected, AFAIK.
   
   1.) If the code generation for constants is supported by the respective target, we'll assume the code will be generated with appropriate sections (if they are C-like) or consumed in any other artifact that expect the constants to be embedded. If the target support neither, then that target is not a target that requires constant 'link'ed to the TIR.
   
   2,) if the USMP is invoked, in which case it will pool all the constants and pulled out of tvm_main and exposed to application layer. 
   
   For basic and most usecases, the constant pools will be generated in the metadata module (See U1 of https://github.com/apache/tvm-rfcs/blob/c520a3912279bcd3c432dafd4070661f614882cf/rfcs/0009_Unified_Static_Memory_Planning.md).
   
   For certain use-cases (See U3 of https://github.com/apache/tvm-rfcs/blob/c520a3912279bcd3c432dafd4070661f614882cf/rfcs/0009_Unified_Static_Memory_Planning.md), this would be where the user writes the application. 
   
   
   
   
   
   
   
   
   
   
   

##########
File path: rfcs/0022-tir-non-scalar-constants.md
##########
@@ -0,0 +1,107 @@
+
+- Feature Name: tir_non_scalar_constants
+- Start Date: 2021-06-01
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/22
+- GitHub Issue: TBD
+
+# 1. Summary
+
+This RFC proposes how non-scalar constants could be represented in TIR and used by passes in the lowering process.
+
+# 2. Motivation 
+
+Currently, the non-scalar constants could be represented in Relay (relay.Constant) to be used by relay passes but not in TIR. Therefore, when performing lowering using TIR passes, we have to maintain a side-channel of tir::Var to constant non-scalar data mapping to perform transformations that could use the knowledge where some of the data are constants.
+
+Few example scenarios as further motivation :
+
+## Weight compression
+
+When lowering for accelerators (E.g. : [Arm(R) Ethos(TM)-U NPU](https://github.com/apache/tvm-rfcs/pull/11)), certain operations will need to get tiled to co-optimize performance and memory utilization. Such tiling patterns create slices of weights that need compressing that will end up with varying sizes. Therefore, the knowledge of some tir::Vars refer to constants are critical in the level of TIR to perform this.
+
+## Memory Planning
+
+The TIR program has the ability to express both inter and intra operator memory requirement, post-scheduling as explained further by [Unified Static Memory Planning RFC](https://github.com/apache/tvm-rfcs/pull/9). It would be better if the constants could be embedded to the TIR PrimFunc. Moreover, this allows various [target-dependent lowerings](https://github.com/apache/tvm-rfcs/pull/10), to produce TIR PrimFuncs with constants in it.
+
+## Winograd Constants
+
+The Winograd transformation (used for fast GEMMs) involves multiplication by a hard-coded constant tensor. This is currently accomplished in TE using a complicated TE compute expression with many nested selects. Being able to directly express a constant tensor here would significantly simplify this code.
+
+
+# 3. Guide-level explanation
+
+This is not particularly a user-facing feature and this will allow constants to be 'linked' to TIR. Initially, we are planning to use this with gated on '-link-params' argument for relay.build and TVMC.
+
+# 4. Reference-level explanation
+
+The proposal is quite simple and it could be explained as follows :
+
+```
+@tvm.script.tir
+def myfunc():   
+   param = tir.allocate_const([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "int32", [10])
+```
+
+This follows closely the semantics of tir.allocate and the difference being it represent a buffer filled with constants.

Review comment:
       I dont think where constants are allocated need to be decided here based on how we decides to represent constants in TIR.
   
   The proposal here, is to represent constants intimately in the TIR primfunc, that could be lowered in a way each target wants them lowered (if they support it).
   
   > Will the constants be allocated on stack or on heap?
   
   So currently where we want to 'link' in the parameters, they will generated part of the runtime.Module and linked via the executor : https://github.com/apache/tvm/pull/6917/files.
   For CPU targets, they will go to a .rodata section, where the constants are held to keep compatibility with linked param code generation of what exists today.
   
   Not sure we want them in stack nor the heap, however, we might want them in different sections if the system has more non-volatile memories.
   
   >  Is this designed for small matrices (e.g. the small matrix in winograd), or relatively larger matrices (e.g. the weight that needs prefetching)?
   
   This is size agnostic, therefore I'd not expect a difference here.
   
   > How will lowering and code generation be affected?
   
   This will only be supported (at least initially and agreed with @tqchen ) with targets that support code generations for constants (currently it uses link-params target argument). Therefore, if a target have this capability (and enabled in a given compilation flow), we go with assumption the target knows how to generate code for the constants to be used by the operator.
   
   > Does it work for GPU and other devices?
   
   I dont think GPU is a target that support (currently) code generation for constants, therefore the constants will live it in the tvm_main function (as relay.Constants).
   
   > How does it affect linkers' job?
   
   So there are mainly two ways linkers' job could be affected, AFAIK.
   
   1.) If the code generation for constants is supported by the respective target, we'll assume the code will be generated with appropriate sections (if they are C-like) or consumed in any other artifact that expect the constants to be embedded. If the target support neither, then that target is not a target that requires constant 'link'ed to the TIR.
   
   2,) if the USMP is invoked, in which case it will pool all the constants and pulled out of tvm_main and exposed to application layer. 
   
   For basic and most usecases, the constant pools will be generated in the metadata module (See U1 of https://github.com/apache/tvm-rfcs/blob/c520a3912279bcd3c432dafd4070661f614882cf/rfcs/0009_Unified_Static_Memory_Planning.md).
   
   For certain use-cases (See U3 of https://github.com/apache/tvm-rfcs/blob/c520a3912279bcd3c432dafd4070661f614882cf/rfcs/0009_Unified_Static_Memory_Planning.md), this would be where the user writes the application. 
   
   
   
   
   
   
   
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org