You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "chenlinzhong (via GitHub)" <gi...@apache.org> on 2023/01/29 08:37:21 UTC

[GitHub] [doris] chenlinzhong opened a new issue, #16203: [Bug] BufferControlBlock may block all fragment handle threads leads to be out of work

chenlinzhong opened a new issue, #16203:
URL: https://github.com/apache/doris/issues/16203

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Version
   
   all version 
   
   ### What's Wrong?
   
   目前vresultsink实现逻辑在特定场景下,执行会被卡住,无法退出
   - vresultsink会先把数据写入到BufferControlBlock中
   - BufferControlBlock的大小为4096行(硬编码),
   - 当buffer中的行数超过会被挂起,直到FE把数据取走,后重新唤醒,如果FE的StmtExecutor因为某些原因异常退出了
   - 会导致这个挂起无法被唤醒,把线程卡住,随着时间的推移,可能最后把fragment处理池的所有线程(默认最大为fragment_pool_thread_num_max=512)都卡住,be停服,但此时心跳线程仍然正常,表现为be假死<img width="851" alt="image" src="https://user-images.githubusercontent.com/11487604/215314234-3faef20c-a509-423b-bf28-8cde5e702045.png">
   
   ```
   #0  0x00007f9d2f3a143c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
   #1  0x0000564a83007bec in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
   #2  0x0000564a7f1f4405 in doris::BufferControlBlock::add_batch(std::unique_ptr<doris::TFetchDataResult, std::default_delete<doris::TFetchDataResult> >&) ()
   #3  0x0000564a809300f0 in doris::vectorized::VMysqlResultWriter::append_block(doris::vectorized::Block&) ()
   #4  0x0000564a808b6df9 in doris::vectorized::VResultSink::send(doris::RuntimeState*, doris::vectorized::Block*) ()
   #5  0x0000564a7f2022d5 in doris::PlanFragmentExecutor::open_vectorized_internal() ()
   #6  0x0000564a7f20393f in doris::PlanFragmentExecutor::open() ()
   ```
   
   
   
   
   ### What You Expected?
   
   BufferControlBlock work well
   
   
   ### How to Reproduce?
   
   1.修改代码
   <img width="924" alt="image" src="https://user-images.githubusercontent.com/11487604/215314795-639d6d29-0a07-4898-afd7-c1b34d6d9e0f.png">
   <img width="970" alt="image" src="https://user-images.githubusercontent.com/11487604/215314881-8ff0f9bb-6982-4d6f-a96f-0254521f779e.png">
   2.建一个表t_user,插入10w行数
   3.执行select * from t_user ,基本100%复现
   
   
   ### Anything Else?
   
   随着可用线程越来越少,这个也是我们大家经常遇到的fragmenttimeout超时的一个原因,尤其在高压力集群下
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] chenlinzhong closed issue #16203: [Bug] BufferControlBlock may block all fragment handle threads leads to be out of work

Posted by "chenlinzhong (via GitHub)" <gi...@apache.org>.
chenlinzhong closed issue #16203: [Bug] BufferControlBlock may block all fragment handle threads  leads to be out of work 
URL: https://github.com/apache/doris/issues/16203


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org