You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "SWJTU-ZhangLei (via GitHub)" <gi...@apache.org> on 2023/04/18 02:33:57 UTC

[GitHub] [doris] SWJTU-ZhangLei opened a new issue, #18766: [Bug] [fe] fe can't start with InsufficientLogException

SWJTU-ZhangLei opened a new issue, #18766:
URL: https://github.com/apache/doris/issues/18766

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Version
   
   root@VM-0-46-ubuntu:/mnt/hdd01/STRESS_ENV/be# ./lib/doris_be --version
   doris-0.0.0-branch-1.2(AVX2) RELEASE (build git://VM-0-22-ubuntu@62b20b126ff94284b3b9b84a6c2e0e931c157565)
   Built on Fri, 07 Apr 2023 21:45:48 CST by VM-0-22-ubuntu
   
   
   ### What's Wrong?
   
   1、fe can't not start
   2、fe.out
   `[2023-04-10 10:52:59] notify new FE type transfer: UNKNOWN
   [2023-04-10 10:52:59] notify new FE type transfer: FOLLOWER
   [2023-04-10 10:52:59] notify new FE type transfer: UNKNOWN
   [2023-04-10 10:52:59] notify new FE type transfer: FOLLOWER
   [2023-04-10 10:52:59] this node is DETACHED
   java.lang.NullPointerException
           at com.sleepycat.je.rep.InsufficientLogException.initRepImpl(InsufficientLogException.java:268)
           at com.sleepycat.je.rep.InsufficientLogException.getRepImpl(InsufficientLogException.java:361)
           at com.sleepycat.je.rep.NetworkRestore.init(NetworkRestore.java:171)
           at com.sleepycat.je.rep.NetworkRestore.execute(NetworkRestore.java:281)
           at org.apache.doris.journal.bdbje.BDBJEJournal.reSetupBdbEnvironment(BDBJEJournal.java:358)
           at org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:343)
           at org.apache.doris.persist.EditLog.open(EditLog.java:1038)
           at org.apache.doris.catalog.Env.initialize(Env.java:863)
           at org.apache.doris.PaloFe.start(PaloFe.java:138)
           at org.apache.doris.PaloFe.main(PaloFe.java:73)`
   
   3、fe.log
   `2023-04-10 10:52:59,166 INFO (UNKNOWN 172.21.0.68_9310_1680101228114(-1)|1) [BDBEnvironment.setup():162] add helper[172.21.0.68:9310] as ReplicationGroupAdmin
   2023-04-10 10:52:59,170 WARN (UNKNOWN 172.21.0.68_9310_1680101228114(-1)|1) [Env.notifyNewFETypeTransfer():2373] notify new FE type transfer: UNKNOWN
   2023-04-10 10:52:59,189 WARN (RepNode 172.21.0.68_9310_1680101228114(-1)|62) [Env.notifyNewFETypeTransfer():2373] notify new FE type transfer: FOLLOWER
   2023-04-10 10:52:59,198 WARN (REPLICA 172.21.0.68_9310_1680101228114(1)|62) [Env.notifyNewFETypeTransfer():2373] notify new FE type transfer: UNKNOWN
   2023-04-10 10:52:59,214 WARN (UNKNOWN 172.21.0.68_9310_1680101228114(1)|62) [Env.notifyNewFETypeTransfer():2373] notify new FE type transfer: FOLLOWER
   2023-04-10 10:52:59,228 WARN (REPLICA 172.21.0.68_9310_1680101228114(1)|62) [BDBStateChangeListener.stateChange():57] this node is DETACHED
   2023-04-10 10:52:59,219 WARN (UNKNOWN 172.21.0.68_9310_1680101228114(-1)|1) [BDBJEJournal.reSetupBdbEnvironment():349] catch insufficient log exception. will recover and try again.
   com.sleepycat.je.rep.InsufficientLogException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.rep.InsufficientLogException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.0.68_9310_1680101228114(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed.refreshVLSN=null logProviders=null repImpl=null props=null
           at com.sleepycat.je.rep.InsufficientLogException.wrapSelf(InsufficientLogException.java:340) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:848) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:802) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.log.LogManager.getLogEntryHandleNotFound(LogManager.java:956) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.dbi.DiskOrderedScanner.fetchEntry(DiskOrderedScanner.java:2068) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.dbi.DiskOrderedScanner.fetchAndProcessBINs(DiskOrderedScanner.java:1640) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.dbi.DiskOrderedScanner.scanSerial(DiskOrderedScanner.java:789) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.dbi.DiskOrderedScanner.scan(DiskOrderedScanner.java:708) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.dbi.DatabaseImpl.count(DatabaseImpl.java:1510) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.Database.count(Database.java:2042) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at org.apache.doris.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:257) ~[doris-fe.jar:1.2-SNAPSHOT]
           at org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:339) ~[doris-fe.jar:1.2-SNAPSHOT]
           at org.apache.doris.persist.EditLog.open(EditLog.java:1038) ~[doris-fe.jar:1.2-SNAPSHOT]
           at org.apache.doris.catalog.Env.initialize(Env.java:863) ~[doris-fe.jar:1.2-SNAPSHOT]
           at org.apache.doris.PaloFe.start(PaloFe.java:138) ~[doris-fe.jar:1.2-SNAPSHOT]
           at org.apache.doris.PaloFe.main(PaloFe.java:73) ~[doris-fe.jar:1.2-SNAPSHOT]
   Caused by: com.sleepycat.je.rep.InsufficientLogException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.0.68_9310_1680101228114(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed. Originally thrown by HA thread: REPLICA 172.21.0.68_9310_1680101228114(1) Originally thrown by HA thread: REPLICA 172.21.0.68_9310_1680101228114(1)
           at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupLogRefresh(ReplicaFeederSyncup.java:706) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:355) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:164) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:732) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
           at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]`
   
   ### What You Expected?
   
   fe can start well
   
   ### How to Reproduce?
   
   It is hard to reproduced,  the followed steps is that we found this problem in our environment.
   
   1、 build a 3 fe and 3 be cluster
   2、import and select data continuously throught a follower fe ip
   3、sometime, we found the master fe oom
   4、after about ten hours, we try to start the master fe, we found it can't start
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yuanyuan8983 commented on issue #18766: [Bug] [fe] fe can't start with InsufficientLogException

Posted by "yuanyuan8983 (via GitHub)" <gi...@apache.org>.
yuanyuan8983 commented on issue #18766:
URL: https://github.com/apache/doris/issues/18766#issuecomment-1512449438

    Has the FE node replaced its IP? Currently, Doris does not support changing IP addresses. You can add me on WeChat: cyllyy810222


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] SWJTU-ZhangLei commented on issue #18766: [Bug] [fe] fe can't start with InsufficientLogException

Posted by "SWJTU-ZhangLei (via GitHub)" <gi...@apache.org>.
SWJTU-ZhangLei commented on issue #18766:
URL: https://github.com/apache/doris/issues/18766#issuecomment-1537078689

   ![image](https://user-images.githubusercontent.com/27994433/236610049-75f0a2c7-d6bc-4a35-955f-26791ec8b35f.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] SWJTU-ZhangLei commented on issue #18766: [Bug] [fe] fe can't start with InsufficientLogException

Posted by "SWJTU-ZhangLei (via GitHub)" <gi...@apache.org>.
SWJTU-ZhangLei commented on issue #18766:
URL: https://github.com/apache/doris/issues/18766#issuecomment-1537078860

   ![image](https://user-images.githubusercontent.com/27994433/236610077-7cdbe409-745b-4460-ad9e-4f67ccc1992c.png)
   the pr 18777 cannot fix problem


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei closed issue #18766: [Bug] [fe] fe can't start with InsufficientLogException

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei closed issue #18766: [Bug] [fe] fe can't start with InsufficientLogException
URL: https://github.com/apache/doris/issues/18766


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org