You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/07/05 12:29:33 UTC

[GitHub] [incubator-doris] ccoffline opened a new issue #6155: [Bug] NPE when replaying CheckConsistencyJob

ccoffline opened a new issue #6155:
URL: https://github.com/apache/incubator-doris/issues/6155


   **Describe the bug**
   A replay NPE made 3 FE crash and cannot recover
   ```
   2021-07-02 04:22:36,862 ERROR (replayer|83) [EditLog.loadJournal():816] Operation Type 29
   java.lang.NullPointerException: null
           at org.apache.doris.consistency.ConsistencyChecker.replayFinishConsistencyCheck(ConsistencyChecker.java:368) ~[palo-fe.jar:3.4.0]
           at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:339) [palo-fe.jar:3.4.0]
           at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2560) [palo-fe.jar:3.4.0]
           at org.apache.doris.catalog.Catalog$3.runOneCycle(Catalog.java:2344) [palo-fe.jar:3.4.0]
           at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0]
   ```
   https://github.com/apache/incubator-doris/blob/d6e6c7815b452d0e262b5c5a7a52fce0880c6117/fe/fe-core/src/main/java/org/apache/doris/consistency/ConsistencyChecker.java#L365-L370
   The previous version of this file didn't prevent the NPE anyway, but never cause NPE.
   https://github.com/apache/incubator-doris/blob/94a81e52c796150333c54838a889be01934983a4/fe/fe-core/src/main/java/org/apache/doris/consistency/ConsistencyChecker.java#L366-L371
   We infer that this NPE is caused by a change in the write-order of editlog. We don't have enough log to prove what’s really going on, but one possible explanation is that:
   - `CheckConsistencyJob.tryFinishJob` has already got the table and try to lock.
   - The table has been dropped just after `tryFinishJob` got the table.
   - The op succeeded on the dropped table, and write an editlog.
   - A follower replay this editlog and crash, and never recover.
   https://github.com/apache/incubator-doris/blob/d6e6c7815b452d0e262b5c5a7a52fce0880c6117/fe/fe-core/src/main/java/org/apache/doris/consistency/CheckConsistencyJob.java#L244-L270
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #6155: [Bug] NPE when replaying CheckConsistencyJob

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #6155:
URL: https://github.com/apache/incubator-doris/issues/6155#issuecomment-874076658


   It may happen. In currently implements, the table level read/write lock will cause this problem. You can see #6136.
   You can simple check if table is null and return in `replayFinishConsistencyCheck()`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #6155: [Bug] NPE when replaying CheckConsistencyJob

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #6155:
URL: https://github.com/apache/incubator-doris/issues/6155#issuecomment-874076658


   It may happen. In currently implements, the table level read/write lock will cause this problem. You can see #6136.
   You can simple check if table is null and return in `replayFinishConsistencyCheck()`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ccoffline commented on issue #6155: [Bug] NPE when replaying CheckConsistencyJob

Posted by GitBox <gi...@apache.org>.
ccoffline commented on issue #6155:
URL: https://github.com/apache/incubator-doris/issues/6155#issuecomment-888927182


   > It may happen. In currently implements, the table level read/write lock will cause this problem. You can see #6136.
   > You can simple check if table is null and return in `replayFinishConsistencyCheck()`
   
   After a lot of discussions, we consider that change the return value of `db.getTable(id/name)` to `Optional` will be a good design to avoid all the possible replay crash when meet the dropped table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] WindyGao edited a comment on issue #6155: [Bug] NPE when replaying CheckConsistencyJob

Posted by GitBox <gi...@apache.org>.
WindyGao edited a comment on issue #6155:
URL: https://github.com/apache/incubator-doris/issues/6155#issuecomment-889484588


   We encountered this problem too in 0.14.12. Is there any way to recover fe node?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ccoffline commented on issue #6155: [Bug] NPE when replaying CheckConsistencyJob

Posted by GitBox <gi...@apache.org>.
ccoffline commented on issue #6155:
URL: https://github.com/apache/incubator-doris/issues/6155#issuecomment-914891735


   fixed by #6386


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ccoffline closed issue #6155: [Bug] NPE when replaying CheckConsistencyJob

Posted by GitBox <gi...@apache.org>.
ccoffline closed issue #6155:
URL: https://github.com/apache/incubator-doris/issues/6155


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] WindyGao commented on issue #6155: [Bug] NPE when replaying CheckConsistencyJob

Posted by GitBox <gi...@apache.org>.
WindyGao commented on issue #6155:
URL: https://github.com/apache/incubator-doris/issues/6155#issuecomment-889484588


   We encountered this problem too. Is there any way to recover?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org