You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/08/06 02:30:56 UTC
[GitHub] [tvm] wrongtest-intellif commented on pull request #12200: [TOPI]fix scatterND large shape problem
wrongtest-intellif commented on PR #12200:
URL: https://github.com/apache/tvm/pull/12200#issuecomment-1207129595
Following the hint of sanitizer that something related to stack overflow happens, there is a possible llvm issue found.
The full dumped ll: [scatter_nd_stackoverflow.ll.txt](https://github.com/apache/tvm/files/9274249/scatter_nd_stackoverflow.ll.txt)
```ll
; Function Attrs: noinline
define internal fastcc i32 @tvmgen_default_fused_scatter_nd_compute_(i8* noalias align 128 %0, i8* noalias nocapture readonly align 128 %1, i8* noalias align 128 %2, i8* noalias align 128 %3) unnamed_addr #2 {
entry:
call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 128 dereferenceable(1080000) %0, i8* noundef nonnull align 128 dereferenceable(1080000) %1, i64 1080000, i1 false)
br label %for_body_j
for_body_j: ; preds = %for_body_j, %entry
%j1 = phi i32 [ 0, %entry ], [ %13, %for_body_j ]
%4 = alloca %closure_loop_parallel_k, align 8
%5 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 0
store i8* %0, i8** %5, align 8
%6 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 1
store i8* %2, i8** %6, align 8
%7 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 2
store i32 %j1, i32* %7, align 8
%8 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 3
store i8* %3, i8** %8, align 8
%9 = load i32 (i32 (i32, %0*, i8*)*, i8*, i32)*, i32 (i32 (i32, %0*, i8*)*, i8*, i32)** @__TVMBackendParallelLaunch, align 8, !tbaa !5
%10 = bitcast %closure_loop_parallel_k* %4 to i8*
%11 = call i32 %9(i32 (i32, %0*, i8*)* nonnull @__tvm_parallel_lambda, i8* nonnull %10, i32 0)
%12 = icmp ne i32 %11, 0
%13 = add nuw nsw i32 %j1, 1
%exitcond.not = icmp eq i32 %13, 270000
%or.cond = select i1 %12, i1 true, i1 %exitcond.not
br i1 %or.cond, label %common.ret, label %for_body_j, !prof !142
common.ret: ; preds = %for_body_j
ret i32 %11
}
```
The alloca `%4 = alloca %closure_loop_parallel_k, align 8` is generated within the loop, but the `alloca`'s lifetime is to the end of the function, we may not expect the stack automatically shrink per iter. Thus it is not a good idea to use alloca under large loops, especially when in the case it is not promote-able to SSA-value.
A quick experimental modification to https://github.com/apache/tvm/blob/main/src/target/llvm/codegen_cpu.cc#L645 make segfault disappear in my environment.
```c++
auto cur_pt = builder_->GetInsertBlock();
builder_->SetInsertPoint(&(*(function_->getEntryBlock().getFirstInsertionPt())));
llvm::Value* cvalue = builder_->CreateAlloca(ctype, ConstInt32(1)); // alloca at function begin
builder_->SetInsertPoint(cur_pt );
```
So I think we could either
1. Avoid parallelize inner loop if the outer is too large.
2. Keep a code segment (perhaps at the function entry) for specially usages of `alloca`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org