You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/04/27 13:41:03 UTC

[GitHub] [incubator-doris] morningman opened a new issue #3406: [Bug] Wrong thread start order cause segfault with signal 11

morningman opened a new issue #3406:
URL: https://github.com/apache/incubator-doris/issues/3406


   **Describe the bug**
   Sometimes when restarting a BE, BE will crash several seconds later, and be.out shows:
   
   ```
   *** Aborted at 1588019672 (unix time) try "date -d @1588019672" if you are using GNU date ***
   PC: @           0xf57722 doris::fs::fs_util::block_manager()
   *** SIGSEGV (@0x248) received by PID 25024 (TID 0x7efe9c1e2700) from PID 584; stack trace: ***
       @     0x7efeabffb3b0 (unknown)
       @           0xf57722 doris::fs::fs_util::block_manager()
       @          0x157ecfa doris::segment_v2::Segment::_parse_footer()
       @          0x158045a doris::segment_v2::Segment::_open()
       @          0x1580910 doris::segment_v2::Segment::open()
       @           0xf114ea doris::BetaRowset::do_load()
       @           0xefa6fe doris::Rowset::load()
       @           0xf1366f doris::BetaRowsetReader::init()
       @           0xe85eb8 doris::Reader::_capture_rs_readers()
       @           0xe890e3 doris::Reader::init()
       @           0xe6fa1f doris::Merger::merge_rowsets()
       @           0xe648ad doris::Compaction::do_compaction_impl()
       @           0xe66a2d doris::Compaction::do_compaction()
       @           0xe66b90 doris::CumulativeCompaction::compact()
       @           0xdeb254 doris::StorageEngine::_perform_cumulative_compaction()
       @           0xe7ef5b doris::StorageEngine::_cumulative_compaction_thread_callback()
       @          0x26181df execute_native_thread_routine
       @     0x7efeabdb0e65 start_thread
       @     0x7efeac0c388d __clone
   ```
   
   This is because the wrong order of instance initialization.
   In file `src/service/doris_main.cpp`:
   
   ```
   180     doris::StorageEngine* engine = nullptr;
   181     auto st = doris::StorageEngine::open(options, &engine);
   182     if (!st.ok()) {
   183         LOG(FATAL) << "fail to open StorageEngine, res=" << st.get_error_msg();
   184         exit(-1);
   185     }
   186
   187     // start backend service for the coordinator on be_port
   188     auto exec_env = doris::ExecEnv::GetInstance();
   189     doris::ExecEnv::init(exec_env, paths);
   190     exec_env->set_storage_engine(engine);
   ```
   
   the `line 181` first open the `StorageEngine`. it will start all backgroud thread such as 
   base and cumulative compaction thread. In these thread, it will try to call `fs::fs_util::block_manager()`.
   
   But the `block_manager()` is only available after `line 190`. So this may cause null pointer 
   access error, which crashes the BE.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org