You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/24 12:18:30 UTC

[GitHub] [doris] SWJTU-ZhangLei opened a new issue, #10410: [Bug] [Fe] BDBJE conflict socket when adding a new fe

SWJTU-ZhangLei opened a new issue, #10410:
URL: https://github.com/apache/doris/issues/10410

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Version
   
   root@regtest-15-bj:/home/zhanglei/test/output_2/be# ./lib/palo_be --version
   trunk RELEASE (build git://regtest-15-bj/home/zhanglei/incubator-doris/be/../@3370c105286ac9f2d590d0bf43f811a5cb52171e)
   Built on Fri, 24 Jun 2022 14:04:29 CST by root@regtest-15-bj
   
   ### What's Wrong?
   
   when adding a new fe,  report the exception:
     422 2022-06-24 20:00:09,620 INFO (main|1) [Catalog.loadBackupHandler():1781] finished replay backupHandler from image
       423 2022-06-24 20:00:09,622 INFO (main|1) [Catalog.loadPaloAuth():1794] finished replay paloAuth from image
       424 2022-06-24 20:00:09,622 INFO (main|1) [Catalog.loadTransactionState():1802] finished replay transactionState from image
       425 2022-06-24 20:00:09,623 INFO (main|1) [Catalog.loadColocateTableIndex():1830] finished replay colocateTableIndex from image
       426 2022-06-24 20:00:09,623 INFO (main|1) [Catalog.loadRoutineLoadJobs():1836] finished replay routineLoadJobs from image
       427 2022-06-24 20:00:09,623 INFO (main|1) [Catalog.loadLoadJobsV2():1842] finished replay loadJobsV2 from image
       428 2022-06-24 20:00:09,623 INFO (main|1) [Catalog.loadSmallFiles():1854] finished replay smallFiles from image
       429 2022-06-24 20:00:09,623 INFO (main|1) [Catalog.loadPlugins():4746] finished replay plugins from image
       430 2022-06-24 20:00:09,666 INFO (main|1) [Catalog.loadDeleteHandler():1787] finished replay deleteHandler from image
       431 2022-06-24 20:00:09,667 INFO (main|1) [Catalog.loadSqlBlockRule():1862] finished replay sqlBlockRule from image
       432 2022-06-24 20:00:09,671 INFO (main|1) [Catalog.loadPolicy():1873] finished replay policy from image
       433 2022-06-24 20:00:09,671 INFO (main|1) [MetaReader.read():104] finished to load image in 257 ms
       434 2022-06-24 20:00:09,993 INFO (UNKNOWN 172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():160] add helper[172.21.16.15:29010] as ReplicationGroupAdmin
       435 2022-06-24 20:00:09,993 INFO (UNKNOWN 172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():166] add self[172.21.16.12:29010] as ReplicationGroupAdmin
       436 2022-06-24 20:00:09,995 WARN (UNKNOWN 172.21.16.12_29010_1656071922199(-1)|1) [Catalog.notifyNewFETypeTransfer():2267] notify new FE type transfer: UNKNOWN
       437 2022-06-24 20:00:10,014 WARN (RepNode 172.21.16.12_29010_1656071922199(-1)|67) [BDBStateChangeListener.stateChange():57] this node is DETACHED
       438 2022-06-24 20:00:20,001 ERROR (UNKNOWN 172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():199] error to open replicated environment. will exit.
       439 com.sleepycat.je.EnvironmentFailureException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta    439 /bdb  Feeder: 172.21.16.15_29010_1656058773192(4). com.sleepycat.je.rep.impl.RepGroupImpl$NodeConflictException: (JE 18.3.12) New or moved node:172.21.16.12_29010_1656071922199, is configured with the socket address: /172.21.16.12:29010.  It conflicts with the socket already used by the    439  member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 17    439 2.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 172.21.16.12_29010_16560719
 22199(-1)
       440         at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:230) ~[je-18.3.12.jar:18.3.12]
       441         at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) ~[je-18.3.12.jar:18.3.12]
       442         at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:151) ~[je-18.3.12.jar:18.3.12]
       443         at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:278) ~[je-18.3.12.jar:18.3.12]
       444         at com.sleepycat.je.Environment.<init>(Environment.java:258) ~[je-18.3.12.jar:18.3.12]
       445         at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:605) ~[je-18.3.12.jar:18.3.12]
       446         at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:464) ~[je-18.3.12.jar:18.3.12]
       447         at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:538) ~[je-18.3.12.jar:18.3.12]
       448         at org.apache.doris.journal.bdbje.BDBEnvironment.setup(BDBEnvironment.java:152) ~[palo-fe.jar:1.0-SNAPSHOT]
       449         at org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:302) ~[palo-fe.jar:1.0-SNAPSHOT]
       450         at org.apache.doris.persist.EditLog.open(EditLog.java:889) ~[palo-fe.jar:1.0-SNAPSHOT]
       451         at org.apache.doris.catalog.Catalog.initialize(Catalog.java:812) ~[palo-fe.jar:1.0-SNAPSHOT]
       452         at org.apache.doris.PaloFe.start(PaloFe.java:128) ~[palo-fe.jar:1.0-SNAPSHOT]
       453         at org.apache.doris.PaloFe.main(PaloFe.java:63) ~[palo-fe.jar:1.0-SNAPSHOT]
       454 Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta/bdb  Feeder: 172.21.16.15_29010_1656058773192(4). com.sleepycat.je.rep.impl.RepGroupIm    454 pl$NodeConflictException: (JE 18.3.12) New or moved node:172.21.16.12_29010_1656071922199, is configured with the socket address: /172.21.16.12:29010.  It conflicts with the socket already used by the member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error during the handshake b    454 etween two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 172.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 172.21.16.12    454 _29010_1656071922199(-1)
       455         at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.verifyMembership(ReplicaFeederHandshake.java:342) ~[je-18.3.12.jar:18.3.12]
       456         at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(ReplicaFeederHandshake.java:267) ~[je-18.3.12.jar:18.3.12]
       457         at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:709) ~[je-18.3.12.jar:18.3.12]
       458         at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) ~[je-18.3.12.jar:18.3.12]
       459         at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) ~[je-18.3.12.jar:18.3.12]
       460         at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) ~[je-18.3.12.jar:18.3.12]
   
   ### What You Expected?
   
   add fe successfully.
   
   ### How to Reproduce?
   
   1、build a cluster with 3 fe (fe1, fe2, fe3), fe1 is master;
   2、stop all fe;
   3、set metadata_failure_recovery=true for fe1(master) and start fe1;
   4、remove the config of metadata_failure_recovery and restart fe1.
   5、use mysql client connect to fe1, drop fe2, fe3.
   6、add fe2, and clear fe2's meta, then start fe2 with --helper fe1.
   7、start fe3 with --helper fe1, fe3's log will print like this:
     288 2022-06-24 19:56:33,653 INFO (UNKNOWN 172.21.16.12_29010_1656058910620(-1)|1) [BDBEnvironment.setup():160] add helper[172.21.16.15:29010] as ReplicationGroupAdmin
       289 2022-06-24 19:56:33,654 INFO (UNKNOWN 172.21.16.12_29010_1656058910620(-1)|1) [BDBEnvironment.setup():166] add self[172.21.16.12:29010] as ReplicationGroupAdmin
       290 2022-06-24 19:56:33,657 WARN (UNKNOWN 172.21.16.12_29010_1656058910620(-1)|1) [Catalog.notifyNewFETypeTransfer():2267] notify new FE type transfer: UNKNOWN
       291 2022-06-24 19:56:33,675 WARN (RepNode 172.21.16.12_29010_1656058910620(-1)|64) [BDBStateChangeListener.stateChange():57] this node is DETACHED
       292 2022-06-24 19:56:43,671 ERROR (UNKNOWN 172.21.16.12_29010_1656058910620(-1)|1) [BDBEnvironment.setup():199] error to open replicated environment. will exit.
       293 com.sleepycat.je.EnvironmentFailureException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656058910620(3):/home/zhanglei/test/output_2/fe/doris-meta/    293 bdb  Feeder: 172.21.16.15_29010_1656058773192(4). The environments have the same name: PALO_JOURNAL_GROUP but represent different environment instances. The environment at the master has UUID 4e0bedad-1111-4c65-92a8-e6be60308d7b, while the replica 172.21.16.12_29010_1656058910620 has UU    293 ID: 25c525de-eadf-4ace-892d-523be019caa4 HANDSHAKE_ERROR: Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 172    293 .21.16.12_29010_1656058910620(-1) Originally thrown by HA thread: RepNode 172.21.16.12_29010_165605891
 0620(-1)
       294         at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:230) ~[je-18.3.12.jar:18.3.12]
       295         at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) ~[je-18.3.12.jar:18.3.12]
       296         at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:151) ~[je-18.3.12.jar:18.3.12]
       297         at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:278) ~[je-18.3.12.jar:18.3.12]
       298         at com.sleepycat.je.Environment.<init>(Environment.java:258) ~[je-18.3.12.jar:18.3.12]
       299         at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:605) ~[je-18.3.12.jar:18.3.12]
       300         at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:464) ~[je-18.3.12.jar:18.3.12]
       301         at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:538) ~[je-18.3.12.jar:18.3.12]
       302         at org.apache.doris.journal.bdbje.BDBEnvironment.setup(BDBEnvironment.java:152) ~[palo-fe.jar:1.0-SNAPSHOT]
       303         at org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:302) ~[palo-fe.jar:1.0-SNAPSHOT]
       304         at org.apache.doris.persist.EditLog.open(EditLog.java:889) ~[palo-fe.jar:1.0-SNAPSHOT]
       305         at org.apache.doris.catalog.Catalog.initialize(Catalog.java:812) ~[palo-fe.jar:1.0-SNAPSHOT]
       306         at org.apache.doris.PaloFe.start(PaloFe.java:128) ~[palo-fe.jar:1.0-SNAPSHOT]
       307         at org.apache.doris.PaloFe.main(PaloFe.java:63) ~[palo-fe.jar:1.0-SNAPSHOT]
       308 Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656058910620(3):/home/zhanglei/test/output_2/fe/doris-meta/bdb  Feeder: 172.21.16.15_29010_1656058773192(4). The environments have the same name:     308 PALO_JOURNAL_GROUP but represent different environment instances. The environment at the master has UUID 4e0bedad-1111-4c65-92a8-e6be60308d7b, while the replica 172.21.16.12_29010_1656058910620 has UUID: 25c525de-eadf-4ace-892d-523be019caa4 HANDSHAKE_ERROR: Error during the handshake be    308 tween two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 172.21.16.12_29010_1656058910620(-1) Originally thrown by HA thread: RepNode 172.21.16.12_    308 29010_1656058910620(-1)
       309         at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.verifyMembership(ReplicaFeederHandshake.java:342) ~[je-18.3.12.jar:18.3.12]
       310         at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(ReplicaFeederHandshake.java:267) ~[je-18.3.12.jar:18.3.12]
       311         at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:709) ~[je-18.3.12.jar:18.3.12]
       312         at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) ~[je-18.3.12.jar:18.3.12]
       313         at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) ~[je-18.3.12.jar:18.3.12]
       314         at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) ~[je-18.3.12.jar:18.3.12]
   
   8、add fe3, and clear fe3's meta, then start fe3 with --helper fe1.   fe3's cannot start, error like this:
       437 2022-06-24 20:00:10,014 WARN (RepNode 172.21.16.12_29010_1656071922199(-1)|67) [BDBStateChangeListener.stateChange():57] this node is DETACHED
       438 2022-06-24 20:00:20,001 ERROR (UNKNOWN 172.21.16.12_29010_1656071922199(-1)|1) [BDBEnvironment.setup():199] error to open replicated environment. will exit.
       439 com.sleepycat.je.EnvironmentFailureException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta    439 /bdb  Feeder: 172.21.16.15_29010_1656058773192(4). com.sleepycat.je.rep.impl.RepGroupImpl$NodeConflictException: (JE 18.3.12) New or moved node:172.21.16.12_29010_1656071922199, is configured with the socket address: /172.21.16.12:29010.  It conflicts with the socket already used by the    439  member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 17    439 2.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 172.21.16.12_29010_16560719
 22199(-1)
       440         at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:230) ~[je-18.3.12.jar:18.3.12]
       441         at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) ~[je-18.3.12.jar:18.3.12]
       442         at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:151) ~[je-18.3.12.jar:18.3.12]
       443         at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:278) ~[je-18.3.12.jar:18.3.12]
       444         at com.sleepycat.je.Environment.<init>(Environment.java:258) ~[je-18.3.12.jar:18.3.12]
       445         at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:605) ~[je-18.3.12.jar:18.3.12]
       446         at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:464) ~[je-18.3.12.jar:18.3.12]
       447         at com.sleepycat.je.rep.ReplicatedEnvironment.<init>(ReplicatedEnvironment.java:538) ~[je-18.3.12.jar:18.3.12]
       448         at org.apache.doris.journal.bdbje.BDBEnvironment.setup(BDBEnvironment.java:152) ~[palo-fe.jar:1.0-SNAPSHOT]
       449         at org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:302) ~[palo-fe.jar:1.0-SNAPSHOT]
       450         at org.apache.doris.persist.EditLog.open(EditLog.java:889) ~[palo-fe.jar:1.0-SNAPSHOT]
       451         at org.apache.doris.catalog.Catalog.initialize(Catalog.java:812) ~[palo-fe.jar:1.0-SNAPSHOT]
       452         at org.apache.doris.PaloFe.start(PaloFe.java:128) ~[palo-fe.jar:1.0-SNAPSHOT]
       453         at org.apache.doris.PaloFe.main(PaloFe.java:63) ~[palo-fe.jar:1.0-SNAPSHOT]
       454 Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.16.12_29010_1656071922199(-1):/home/zhanglei/test/output_2/fe/doris-meta/bdb  Feeder: 172.21.16.15_29010_1656058773192(4). com.sleepycat.je.rep.impl.RepGroupIm    454 pl$NodeConflictException: (JE 18.3.12) New or moved node:172.21.16.12_29010_1656071922199, is configured with the socket address: /172.21.16.12:29010.  It conflicts with the socket already used by the member: 172.21.16.12_29010_1656058910620 HANDSHAKE_ERROR: Error during the handshake b    454 etween two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 172.21.16.12_29010_1656071922199(-1) Originally thrown by HA thread: RepNode 172.21.16.12    454 _29010_1656071922199(-1)
       455         at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.verifyMembership(ReplicaFeederHandshake.java:342) ~[je-18.3.12.jar:18.3.12]
       456         at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(ReplicaFeederHandshake.java:267) ~[je-18.3.12.jar:18.3.12]
       457         at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:709) ~[je-18.3.12.jar:18.3.12]
       458         at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) ~[je-18.3.12.jar:18.3.12]
       459         at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) ~[je-18.3.12.jar:18.3.12]
       460         at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) ~[je-18.3.12.jar:18.3.12]
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring closed issue #10410: [Bug] [Fe] BDBJE conflict socket when adding a new fe

Posted by GitBox <gi...@apache.org>.
dataroaring closed issue #10410: [Bug] [Fe] BDBJE conflict socket when adding a new fe
URL: https://github.com/apache/doris/issues/10410


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org