You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2019/11/26 05:29:30 UTC

[GitHub] [incubator-tvm] liangfu commented on a change in pull request #4392: [VTA] Enable streamlined GEMM execution

liangfu commented on a change in pull request #4392: [VTA] Enable streamlined GEMM execution
URL: https://github.com/apache/incubator-tvm/pull/4392#discussion_r350549356
 
 

 ##########
 File path: vta/hardware/chisel/src/main/scala/core/TensorGemm.scala
 ##########
 @@ -126,8 +145,7 @@ class MatrixVectorMultiplication(implicit p: Parameters) extends Module {
   })
   val dot = Seq.fill(size)(
     Module(new DotProduct(aBits = inpBits, bBits = wgtBits, size)))
-  val acc = Seq.fill(size)(
-    Module(new Pipe(UInt(accBits.W), latency = log2Ceil(size) + 1)))
+  val acc = Seq.fill(size)(Module(new Pipe(UInt(accBits.W), latency = 2)))
 
 Review comment:
   It's one cycle for MAC module (, which is a fused-mulitply-adder (FMA)), the other one cycle for one `PipeAdder` in the first layer of the accumulator. Therefore, this 1+1=2 should be smaller than 4 (the states iterate over `sReadUop :: sComputeIdx :: sReadTensor :: sExe` in TensorGemm module), in order to ensure accumulated `acc` should be available for sReadTensor stage. Does this make sense?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services