You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/08/05 13:53:45 UTC

[GitHub] [incubator-doris] ccoffline opened a new issue #6386: [Proposal] Enforce null check at Catalog.getDb and Database.getTable

ccoffline opened a new issue #6386:
URL: https://github.com/apache/incubator-doris/issues/6386


   Commit #3775 that introduce table lock has many critical bugs that make all FE crush. These bugs are all caused by NPE while replaying editlogs. see #5378 #5391 #5688 #5973 #6155
   In the ideal case, replaying editlogs should not throw NPE because the ops before has already check the preconditions and in the correct order.
   #3775 did not keep edit ops synchronized at DB level and upset the order of editlogs. This will happens when edit ops on a table execute concurrently with another ops on that table with a db write lock, such as drop/replace/rename.
   
   While replaying disordered editlogs, the meta may be inconsistent bewteen MASTER and FOLLOWER.
   For instance, if an editlog on a table is come after the editlog to drop that table, the current approach is to simply ignore this editlog. But if user recover that table afterwards, the edit ops on the dropped table object will be recovered in MASTER but lost by FOLLOWER.
   It gets more complicated with ops replace/rename.
   
   Its necessary to make sure the meta is 100% consistent. However, The replay bugs will make the cluster completely unavailable, so it's the most urgent task for us to avoid any NPE during replay. When designing the fix, we took the following factors into primary consideration 
   * The return of `Catalog.getDb` and `Database.getTable` should only returns Optional, or return non-null value and throw exception if null. The caller can directly call `Optional.get` or `Optional.orElse(null)` after considering the null situation and the reviewer can easily notice potential NPEs.
   * In the replay routine, `Catalog.getDb` and `Database.getTable` can throw `MetaNotFoundException` and caught by `Editlog.loadJournal`. This indicates that meta may be inconsistent, so we need log a warning for tracking.
   * This fix should only focus on avoiding replay NPEs and be consistent with the original process logic. Mark any potential bug if possible, That can be discussed later, such as dropping one database concurrently that hardly ever happens in the reality.
   
   After this, there is still a lot of works. We need to fix many inconsistent lock routines and develop concurrent test on meta.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman closed issue #6386: [Proposal] Enforce null check at Catalog.getDb and Database.getTable

Posted by GitBox <gi...@apache.org>.
morningman closed issue #6386:
URL: https://github.com/apache/incubator-doris/issues/6386


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org