You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/08/06 02:30:56 UTC

[GitHub] [tvm] wrongtest-intellif commented on pull request #12200: [TOPI]fix scatterND large shape problem

wrongtest-intellif commented on PR #12200:
URL: https://github.com/apache/tvm/pull/12200#issuecomment-1207129595

   Following the hint of sanitizer that something related to stack overflow happens, there is a possible llvm issue found.
   
   The full dumped ll: [scatter_nd_stackoverflow.ll.txt](https://github.com/apache/tvm/files/9274249/scatter_nd_stackoverflow.ll.txt)
   
   ```ll
   ; Function Attrs: noinline
   define internal fastcc i32 @tvmgen_default_fused_scatter_nd_compute_(i8* noalias align 128 %0, i8* noalias nocapture readonly align 128 %1, i8* noalias align 128 %2, i8* noalias align 128 %3) unnamed_addr #2 {
   entry:
     call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 128 dereferenceable(1080000) %0, i8* noundef nonnull align 128 dereferenceable(1080000) %1, i64 1080000, i1 false)
     br label %for_body_j
   
   for_body_j:                                       ; preds = %for_body_j, %entry
     %j1 = phi i32 [ 0, %entry ], [ %13, %for_body_j ]
     %4 = alloca %closure_loop_parallel_k, align 8
     %5 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 0
     store i8* %0, i8** %5, align 8
     %6 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 1
     store i8* %2, i8** %6, align 8
     %7 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 2
     store i32 %j1, i32* %7, align 8
     %8 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 3
     store i8* %3, i8** %8, align 8
     %9 = load i32 (i32 (i32, %0*, i8*)*, i8*, i32)*, i32 (i32 (i32, %0*, i8*)*, i8*, i32)** @__TVMBackendParallelLaunch, align 8, !tbaa !5
     %10 = bitcast %closure_loop_parallel_k* %4 to i8*
     %11 = call i32 %9(i32 (i32, %0*, i8*)* nonnull @__tvm_parallel_lambda, i8* nonnull %10, i32 0)
     %12 = icmp ne i32 %11, 0
     %13 = add nuw nsw i32 %j1, 1
     %exitcond.not = icmp eq i32 %13, 270000
     %or.cond = select i1 %12, i1 true, i1 %exitcond.not
     br i1 %or.cond, label %common.ret, label %for_body_j, !prof !142
   
   common.ret:                                       ; preds = %for_body_j
     ret i32 %11
   }
   ```
   
   The alloca `%4 = alloca %closure_loop_parallel_k, align 8` is generated within the loop, but the `alloca`'s lifetime is to the end of    the function, we may not expect the stack automatically shrink per iter. Thus it is not a good idea to use alloca under large loops, especially when in the case it is not promote-able to SSA-value.
   
   A quick experimental modification to https://github.com/apache/tvm/blob/main/src/target/llvm/codegen_cpu.cc#L645 make segfault disappear in my environment.
   ```c++
   auto cur_pt = builder_->GetInsertBlock();
   builder_->SetInsertPoint(&(*(function_->getEntryBlock().getFirstInsertionPt())));
   llvm::Value* cvalue = builder_->CreateAlloca(ctype, ConstInt32(1));  // alloca at function begin
   builder_->SetInsertPoint(cur_pt );
   ``` 
   
   So I think we could either
   1. Avoid parallelize inner loop if the outer is too large.
   2. Keep a code segment (perhaps at the function entry) for specially usages of `alloca`.  
   
     
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org