You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/01/27 16:34:19 UTC

[GitHub] [incubator-doris] acelyc111 opened a new issue #5308: Compaction producer hang and none compaction consumer works

acelyc111 opened a new issue #5308:
URL: https://github.com/apache/incubator-doris/issues/5308


   **Describe the bug**
   I found some BE's compactiom score metric not update anymore, I doubt the compaction not works, and use `pstack` to show all threads, there is no "compaction" words except the compaction producer thread.
   And then use `gcore` to dump the process, the producer's stack:
   ```
   (gdb) t 260
   [Switching to thread 260 (Thread 0x7f499f6fd700 (LWP 5470))]
   #0  0x00007f4a21bd36d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
   (gdb) bt
   #0  0x00007f4a21bd36d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
   #1  0x000000000263a23c in __gthread_cond_wait (__mutex=<optimized out>, __cond=__cond@entry=0x4cf3e88) at /var/local/gcc/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:864
   #2  std::condition_variable::wait (this=this@entry=0x4cf3e88, __lock=...) at ../../../.././libstdc++-v3/src/c++11/condition_variable.cc:53
   #3  0x0000000000e61bf3 in wait<doris::CompactionPermitLimiter::request(int64_t)::<lambda()> > (__p=..., __lock=..., this=0x4cf3e88) at /usr/include/c++/7.3.0/condition_variable:99
   #4  doris::CompactionPermitLimiter::request (this=this@entry=0x4cf3e50, permits=permits@entry=2792) at /root/doris-env/doris/be/src/olap/compaction_permit_limiter.cpp:30
   #5  0x0000000000dd28be in doris::StorageEngine::_compaction_tasks_producer_callback (this=0x4cf3c00) at /root/doris-env/doris/be/src/olap/olap_server.cpp:354
   #6  0x0000000000dd31ac in operator() (__closure=<optimized out>) at /root/doris-env/doris/be/src/olap/olap_server.cpp:93
   #7  std::_Function_handler<void(), doris::StorageEngine::start_bg_threads()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/7.3.0/bits/std_function.h:316
   #8  0x0000000001117718 in operator() (this=0x6ce84a68) at /usr/include/c++/7.3.0/bits/std_function.h:706
   #9  doris::Thread::supervise_thread (arg=0x6ce84a50) at /root/doris-env/doris/be/src/util/thread.cpp:304
   #10 0x00007f4a21bcfdc5 in start_thread () from /lib64/libpthread.so.0
   #11 0x00007f4a21edb73d in clone () from /lib64/libc.so.6
   (gdb) f 4
   #4  doris::CompactionPermitLimiter::request (this=this@entry=0x4cf3e50, permits=permits@entry=2792) at /root/doris-env/doris/be/src/olap/compaction_permit_limiter.cpp:30
   30	/root/doris-env/doris/be/src/olap/compaction_permit_limiter.cpp: 没有那个文件或目录.
   (gdb) p _used_permits
   $1 = {_value = 0}
   (gdb) p permits
   $2 = 2792
   (gdb) p config::total_permits_for_compaction_score
   $3 = 2000
   (gdb)
   ```
   That means the wait condition is satisfied but the cv is still wait, so there must be some multi-threads sync problems.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   Very occasional to happen.
   
   **Expected behavior**
   All threads works well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wangbo commented on issue #5308: Compaction producer hang and none compaction consumer works

Posted by GitBox <gi...@apache.org>.
wangbo commented on issue #5308:
URL: https://github.com/apache/incubator-doris/issues/5308#issuecomment-768783162


   code version: master
   ```
   bool CompactionPermitLimiter::request(int64_t permits) {
       DorisMetrics::instance()->compaction_waitting_permits->set_value(permits);
       if (permits > config::total_permits_for_compaction_score) {
           // when tablet's compaction score is larger than "config::total_permits_for_compaction_score",
           // it's necessary to do compaction for this tablet because this tablet will not get "permits"
           // anyway. otherwise, compaction task for this tablet will not be executed forever.
           std::unique_lock<std::mutex> lock(_permits_mutex);
           _permits_cv.wait(lock, [=] {
               return _used_permits == 0 ||
                      _used_permits + permits <= config::total_permits_for_compaction_score;
           });
       } else {
           if (_used_permits + permits > config::total_permits_for_compaction_score) {
               std::unique_lock<std::mutex> lock(_permits_mutex);
               _permits_cv.wait(lock, [=] {
                   return _used_permits + permits <= config::total_permits_for_compaction_score;
               });
           }
       }
       _used_permits += permits;  // here _used_permits is updated out of lock, Does this have concurrency issues ?
       DorisMetrics::instance()->compaction_waitting_permits->set_value(0);
       DorisMetrics::instance()->compaction_used_permits->set_value(_used_permits);
       return true;
   }
   ```
   I have a question as in the comment above.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] acelyc111 closed issue #5308: Compaction producer hang and none compaction consumer works

Posted by GitBox <gi...@apache.org>.
acelyc111 closed issue #5308:
URL: https://github.com/apache/incubator-doris/issues/5308


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wangbo commented on issue #5308: Compaction producer hang and none compaction consumer works

Posted by GitBox <gi...@apache.org>.
wangbo commented on issue #5308:
URL: https://github.com/apache/incubator-doris/issues/5308#issuecomment-768790322


   > > code version: master
   > > ```
   > > bool CompactionPermitLimiter::request(int64_t permits) {
   > >     DorisMetrics::instance()->compaction_waitting_permits->set_value(permits);
   > >     if (permits > config::total_permits_for_compaction_score) {
   > >         // when tablet's compaction score is larger than "config::total_permits_for_compaction_score",
   > >         // it's necessary to do compaction for this tablet because this tablet will not get "permits"
   > >         // anyway. otherwise, compaction task for this tablet will not be executed forever.
   > >         std::unique_lock<std::mutex> lock(_permits_mutex);
   > >         _permits_cv.wait(lock, [=] {
   > >             return _used_permits == 0 ||
   > >                    _used_permits + permits <= config::total_permits_for_compaction_score;
   > >         });
   > >     } else {
   > >         if (_used_permits + permits > config::total_permits_for_compaction_score) {
   > >             std::unique_lock<std::mutex> lock(_permits_mutex);
   > >             _permits_cv.wait(lock, [=] {
   > >                 return _used_permits + permits <= config::total_permits_for_compaction_score;
   > >             });
   > >         }
   > >     }
   > >     _used_permits += permits;  // here _used_permits is updated out of lock, Does this have concurrency issues ?
   > >     DorisMetrics::instance()->compaction_waitting_permits->set_value(0);
   > >     DorisMetrics::instance()->compaction_used_permits->set_value(_used_permits);
   > >     return true;
   > > }
   > > ```
   > > 
   > > 
   > > I have a question as in the comment above.
   > 
   > @wangbo
   > `_used_permits` is defined as a type of `AtomicInt64`.
   
   I understand.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] weizuo93 commented on issue #5308: Compaction producer hang and none compaction consumer works

Posted by GitBox <gi...@apache.org>.
weizuo93 commented on issue #5308:
URL: https://github.com/apache/incubator-doris/issues/5308#issuecomment-768787964


   > code version: master
   > 
   > ```
   > bool CompactionPermitLimiter::request(int64_t permits) {
   >     DorisMetrics::instance()->compaction_waitting_permits->set_value(permits);
   >     if (permits > config::total_permits_for_compaction_score) {
   >         // when tablet's compaction score is larger than "config::total_permits_for_compaction_score",
   >         // it's necessary to do compaction for this tablet because this tablet will not get "permits"
   >         // anyway. otherwise, compaction task for this tablet will not be executed forever.
   >         std::unique_lock<std::mutex> lock(_permits_mutex);
   >         _permits_cv.wait(lock, [=] {
   >             return _used_permits == 0 ||
   >                    _used_permits + permits <= config::total_permits_for_compaction_score;
   >         });
   >     } else {
   >         if (_used_permits + permits > config::total_permits_for_compaction_score) {
   >             std::unique_lock<std::mutex> lock(_permits_mutex);
   >             _permits_cv.wait(lock, [=] {
   >                 return _used_permits + permits <= config::total_permits_for_compaction_score;
   >             });
   >         }
   >     }
   >     _used_permits += permits;  // here _used_permits is updated out of lock, Does this have concurrency issues ?
   >     DorisMetrics::instance()->compaction_waitting_permits->set_value(0);
   >     DorisMetrics::instance()->compaction_used_permits->set_value(_used_permits);
   >     return true;
   > }
   > ```
   > 
   > I have a question as in the comment above.
   
   @wangbo 
   `_used_permits` is defined as a type of `AtomicInt64`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org