You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by "janetsc (via GitHub)" <gi...@apache.org> on 2023/01/31 17:27:27 UTC

[GitHub] [tvm] janetsc commented on a diff in pull request #13844: [Hexagon] Detect and flush "dirty" cache prior to DMA read with cache bypass enabled

janetsc commented on code in PR #13844:
URL: https://github.com/apache/tvm/pull/13844#discussion_r1092261997


##########
src/tir/transforms/lower_async_dma.cc:
##########
@@ -192,19 +198,33 @@ class AsyncDMALowerer : public StmtExprMutator {
       // save queue ID for inspection in `wait` transform
       queue_ids_.insert(queue_id);
 
-      return Evaluate(Call(DataType::Int(32), builtin::dma_copy(),
-                           {queue_id,
-                            Call(DataType::Handle(), builtin::address_of(),
-                                 {BufferLoad(bufferstorenode->buffer, store_index)}),
-                            Call(DataType::Handle(), builtin::address_of(),
-                                 {BufferLoad(bufferloadnode->buffer, load_index)}),
-                            for_loop->extent * bufferloadnode->dtype.bytes(), dma_bypass_cache_}));
+      auto call_dma_copy =
+          Evaluate(Call(DataType::Int(32), builtin::dma_copy(),
+                        {queue_id,
+                         Call(DataType::Handle(), builtin::address_of(),
+                              {BufferLoad(bufferstorenode->buffer, store_index)}),
+                         Call(DataType::Handle(), builtin::address_of(),
+                              {BufferLoad(bufferloadnode->buffer, load_index)}),
+                         for_loop->extent * bufferloadnode->dtype.bytes(), dma_bypass_cache_}));
+
+      // if the buffer we are about to DMA was modified by the primfunc
+      // then we need to flush the buffer from the cache prior to the DMA

Review Comment:
   So the next PR could have:
   
   1.  Unit tests that expose the root of the problem (if they don't, we should go back to the drawing board)
   2.  Additions to the matmul test that expose the need to invalidate in the VTCM to DDR direction
   3. Changes to insert flush in the right place and invalidate in the right place.
   
   The main reason I was advocating to do both directions now is that I'd like to confirm that those two unit tests do indeed fail.  If they don't, we'll need to do more experiments to get to the root of it.  The unit tests (from our offline discussion):
   
   DDR to VTCM
       Allocate two buffers in DDR and VTCM
       Write "0xbeefbeef" to the DDR buffer.  Flush.
       Write "0xfaceface" to the DDR buffer.  Do NOT flush.
       DMA from DDR to VTCM with BYPASS ON.
   Compare DDR and VTCM buffers.  They should not match, because the real values you wanted in DDR weren't flushed to DDR.
   
   VTCM to DDR
       Allocate two buffers in DDR and VTCM
       Write "0xbeefbeef" to the DDR buffer.  (You can flush or not - it actually won't matter.)
       Write "0xfaceface" to the VTCM buffer.
       DMA from VTCM to DDR with BYPASS ON.
   Compare DDR and VTCM buffers.  They will not match because there were stale values in the L2 cache for the DDR buffer, so it won't pick up what you wrote with BYPASS ON.  DDR needed to be invalidated first.
   
   All that said, with a flush-only operation for DMA, I'll sign off on this as a good incremental step.  Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org