You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/12/01 11:27:06 UTC

[GitHub] [incubator-doris] acelyc111 opened a new issue #4996: [Bug] Should not assume the tablet is newer by it's load time when BE start

acelyc111 opened a new issue #4996:
URL: https://github.com/apache/incubator-doris/issues/4996


   **Describe the bug**
   I found a coredump, back trace look like:
   ```
   Program terminated with signal 6, Aborted.
   #0  0x00007fca7abcb1d7 in raise () from /lib64/libc.so.6
   Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 zlib-1.2.7-17.el7.x86_64
   (gdb) bt
   #0  0x00007fca7abcb1d7 in raise () from /lib64/libc.so.6
   #1  0x00007fca7abcc8c8 in abort () from /lib64/libc.so.6
   #2  0x0000000001b13376 in google::DumpStackTraceAndExit () at src/utilities.cc:147
   #3  0x0000000001b0a67d in google::LogMessage::Fail () at src/logging.cc:1599
   #4  0x0000000001b0c504 in google::LogMessage::SendToLog (this=0x7fca74df2770) at src/logging.cc:1553
   #5  0x0000000001b0a1a4 in google::LogMessage::Flush (this=0x7fca74df2770) at src/logging.cc:1422
   #6  0x0000000001b0cf39 in google::LogMessageFatal::~LogMessageFatal (this=<optimized out>, __in_chrg=<optimized out>) at src/logging.cc:2125
   #7  0x0000000000e26694 in doris::DataDir::load (this=0x4d74f00) at /builds/olap/doris/be/src/olap/data_dir.cpp:705
   #8  0x0000000000e09dd9 in operator() (__closure=0x5349558) at /builds/olap/doris/be/src/olap/storage_engine.cpp:149
   #9  __invoke_impl<void, doris::StorageEngine::load_data_dirs(const std::vector<doris::DataDir*>&)::<lambda()> > (__f=...) at /usr/include/c++/7.3.0/bits/invoke.h:60
   #10 __invoke<doris::StorageEngine::load_data_dirs(const std::vector<doris::DataDir*>&)::<lambda()> > (__fn=...) at /usr/include/c++/7.3.0/bits/invoke.h:95
   #11 _M_invoke<0> (this=0x5349558) at /usr/include/c++/7.3.0/thread:234
   #12 operator() (this=0x5349558) at /usr/include/c++/7.3.0/thread:243
   #13 std::thread::_State_impl<std::thread::_Invoker<std::tuple<doris::StorageEngine::load_data_dirs(const std::vector<doris::DataDir*>&)::<lambda()> > > >::_M_run(void) (this=0x5349550) at /usr/include/c++/7.3.0/thread:186
   #14 0x00000000026b642f in std::execute_native_thread_routine (__p=0x5349550) at ../../../.././libstdc++-v3/src/c++11/thread.cc:83
   #15 0x00007fca7a981dc5 in start_thread () from /lib64/libpthread.so.0
   #16 0x00007fca7ac8d73d in clone () from /lib64/libc.so.6
   ```
   I checked the related log:
   ```
   W1201 14:52:13.408074 183882 tablet_manager.cpp:155] add duplicated tablet. force=0, res=-500, tablet_id=5164922, schema_hash=502924845, old_version=2, new_version=2, old_time=1606138765, new_time=1599296476, old_tablet_path=/home/work/app/doris/c3prc-hadoop-test/be/ssd1/data/325/5164922/502924845, new_tablet_path=/home/work/app/doris/c3prc-hadoop-test/be/ssd2/data/64/5164922/502924845
   W1201 14:52:13.408120 183882 tablet_manager.cpp:843] fail to add tablet. tablet=5164922.502924845.1848811be2b4e08b-4abe001b0545fcb3[res=-500]
   W1201 14:52:13.408583 183882 data_dir.cpp:690] load tablet from header failed. status:-500, tablet=5164922.502924845            // !!!critical log
   W1201 14:52:13.409047 183882 alpha_rowset.cpp:327] tablet: 5164930 expect zone map size is 253, actual num is 4. If this is not the first start after upgrade, please pay attention!
   W1201 14:52:13.409586 183882 alpha_rowset.cpp:327] tablet: 5164990 expect zone map size is 253, actual num is 4. If this is not the first start after upgrade, please pay attention!
   W1201 14:52:13.410159 183882 alpha_rowset.cpp:327] tablet: 5165054 expect zone map size is 253, actual num is 4. If this is not the first start after upgrade, please pay attention!
   W1201 14:52:13.410725 183882 alpha_rowset.cpp:327] tablet: 5165078 expect zone map size is 253, actual num is 4. If this is not the first start after upgrade, please pay attention!
   I1201 14:52:13.410773 183882 tablet_manager.cpp:461] begin drop tablet. tablet_id=5165078, schema_hash=502924845
   I1201 14:52:13.410786 183882 tablet_manager.cpp:1387] set tablet to shutdown state and remove it from memory. tablet_id=5165078, schema_hash=502924845, tablet_path=/home/work/app/doris/c3prc-hadoop-test/be/ssd1/data/162/5165078/502924845
   I1201 14:52:13.411496 183882 tablet_meta_manager.cpp:115] save tablet meta , key:tabletmeta_5165078_502924845 meta_size=93382
   W1201 14:52:13.411962 183882 tablet_manager.cpp:155] add duplicated tablet. force=0, res=0, tablet_id=5165078, schema_hash=502924845, old_version=2, new_version=2, old_time=1599296540, new_time=1606161418, old_tablet_path=/home/work/app/doris/c3prc-hadoop-test/be/ssd1/data/162/5165078/502924845, new_tablet_path=/home/work/app/doris/c3prc-hadoop-test/be/ssd2/data/506/5165078/502924845
   W1201 14:52:13.412612 183882 alpha_rowset.cpp:327] tablet: 5165122 expect zone map size is 253, actual num is 4. If this is not the first start after upgrade, please pay attention!
   W1201 14:52:13.413225 183882 alpha_rowset.cpp:327] tablet: 5165158 expect zone map size is 253, actual num is 4. If this is not the first start after upgrade, please pay attention!
   W1201 14:52:13.413820 183882 alpha_rowset.cpp:327] tablet: 5165170 expect zone map size is 253, actual num is 4. If this is not the first start after upgrade, please pay attention!
   W1201 14:52:15.418694 183882 data_dir.cpp:700] load tablets from header failed, loaded tablet: 45330, error tablet: 1, path: /home/work/app/doris/c3prc-hadoop-test/be/ssd2
   F1201 14:52:15.418807 183882 data_dir.cpp:705] load tablets encounter failure. stop BE process. path: /home/work/app/doris/c3prc-hadoop-test/be/ssd2
   ```
   It says that when load a new tablet in another data dir with the same tablet id, it may lead error, and the BE will exit.
   After reading the code:
   https://github.com/apache/incubator-doris/blob/df1f06e60b1339ef6e2756d0c4cb492cb64986c7/be/src/olap/tablet_manager.cpp#L130-L151
   
   I doubt if there is a bug, data dirs are parallelly loaded by multi threads, a later loaded tablet may be older than the previously loaded tablet, we should not assume that a later loaded tablet must be newer (judged by version and create time).
   
   **Expected behavior**
   When found a older tablet loaded, just skip.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay closed issue #4996: [Bug] Should not assume the tablet is newer by it's load time when BE start

Posted by GitBox <gi...@apache.org>.
imay closed issue #4996:
URL: https://github.com/apache/incubator-doris/issues/4996


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org