You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/08/21 06:21:15 UTC

[GitHub] [incubator-doris] francisoliverlee opened a new issue #4414: [FE] fe out of service when some be dead

francisoliverlee opened a new issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414


   **Describe the bug**
   fe not alive on fe UI console when some be dead
   
   **Expected behavior**
   FEs run OK, be dead
   
   **Screenshots**
   FE log
   
   ```java
   2020-08-21 02:05:41,172 WARN 35 [TabletStatMgr.runAfterCatalogReady():69] task exec error. backend[10007]
   org.apache.thrift.transport.TTransportException: java.net.ConnectException: 拒绝连接 (Connection refused)
   	at org.apache.thrift.transport.TSocket.open(TSocket.java:226) ~[libthrift-0.9.3.jar:0.9.3]
   	at org.apache.doris.common.GenericPool$ThriftClientFactory.create(GenericPool.java:128) ~[palo-fe.jar:?]
   	at org.apache.doris.common.GenericPool$ThriftClientFactory.create(GenericPool.java:113) ~[palo-fe.jar:?]
   	at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactory.java:62) ~[commons-pool2-2.2.jar:2.2]
   	at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1012) ~[commons-pool2-2.2.jar:2.2]
   	at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:356) ~[commons-pool2-2.2.jar:2.2]
   	at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:277) ~[commons-pool2-2.2.jar:2.2]
   	at org.apache.doris.common.GenericPool.borrowObject(GenericPool.java:85) ~[palo-fe.jar:?]
   	at org.apache.doris.catalog.TabletStatMgr.runAfterCatalogReady(TabletStatMgr.java:61) [palo-fe.jar:?]
   	at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:?]
   	at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:?]
   Caused by: java.net.ConnectException: 拒绝连接 (Connection refused)
   	at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_251]
   	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_251]
   	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_251]
   	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_251]
   	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_251]
   	at java.net.Socket.connect(Socket.java:606) ~[?:1.8.0_251]
   	at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ~[libthrift-0.9.3.jar:0.9.3]
   	... 10 more
   
   2020-08-21 02:05:40,284 WARN 73 [BDBEnvironment.openDatabase():255] get exception when try to close previously opened bdb database. ignore it
   com.sleepycat.je.rep.DatabasePreemptedException: (JE 7.3.7) (JE 7.3.7) Database 4538523 has been forcibly closed in order to apply a replicated remove operation.  This Database and all associated Cursors must be closed.  All associated Transactions must be aborted.
   	at com.sleepycat.je.rep.DatabasePreemptedException.wrapSelf(DatabasePreemptedException.java:113) ~[je-7.3.7.jar:7.3.7]
   	at com.sleepycat.je.Database.checkOpen(Database.java:2274) ~[je-7.3.7.jar:7.3.7]
   	at com.sleepycat.je.Database.getDatabaseName(Database.java:2046) ~[je-7.3.7.jar:7.3.7]
   	at org.apache.doris.journal.bdbje.BDBEnvironment.openDatabase(BDBEnvironment.java:231) [palo-fe.jar:?]
   	at org.apache.doris.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:269) [palo-fe.jar:?]
   	at org.apache.doris.persist.EditLog.getMaxJournalId(EditLog.java:94) [palo-fe.jar:?]
   	at org.apache.doris.catalog.Catalog.getMaxJournalId(Catalog.java:4661) [palo-fe.jar:?]
   	at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2385) [palo-fe.jar:?]
   	at org.apache.doris.catalog.Catalog$3.runOneCycle(Catalog.java:2190) [palo-fe.jar:?]
   	at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:?]
   Caused by: com.sleepycat.je.rep.DatabasePreemptedException: (JE 7.3.7) Database 4538523 has been forcibly closed in order to apply a replicated remove operation.  This Database and all associated Cursors must be closed.  All associated Transactions must be aborted.
   	at com.sleepycat.je.rep.impl.RepImpl.createDatabasePreemptedException(RepImpl.java:2008) ~[je-7.3.7.jar:7.3.7]
   	at com.sleepycat.je.rep.impl.RepImpl.createDatabasePreemptedException(RepImpl.java:143) ~[je-7.3.7.jar:7.3.7]
   	at com.sleepycat.je.Database.setPreempted(Database.java:469) ~[je-7.3.7.jar:7.3.7]
   	at com.sleepycat.je.DbInternal.setPreempted(DbInternal.java:58) ~[je-7.3.7.jar:7.3.7]
   	at com.sleepycat.je.dbi.DbTree.lockNameLN(DbTree.java:972) ~[je-7.3.7.jar:7.3.7]
   	at com.sleepycat.je.dbi.DbTree.doRemoveDb(DbTree.java:1172) ~[je-7.3.7.jar:7.3.7]
   	at com.sleepycat.je.dbi.DbTree.removeReplicaDb(DbTree.java:1239) ~[je-7.3.7.jar:7.3.7]
   	at com.sleepycat.je.rep.impl.node.Replay.applyNameLN(Replay.java:872) ~[je-7.3.7.jar:7.3.7]
   	at com.sleepycat.je.rep.impl.node.Replay.replayEntry(Replay.java:598) ~[je-7.3.7.jar:7.3.7]
   	at com.sleepycat.je.rep.impl.node.Replica$ReplayThread.run(Replica.java:1213) ~[je-7.3.7.jar:7.3.7]
   2020-08-21 02:05:01,916 WARN 91 [DefaultChannelPipeline.onUnhandledInboundException():1164] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
   java.io.IOException: 打开的文件过多
   	at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[?:1.8.0_251]
   	at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) ~[?:1.8.0_251]
   	at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) ~[?:1.8.0_251]
   	at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:110) ~[netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:107) ~[netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_251]
   	at io.netty.util.internal.SocketUtils.accept(SocketUtils.java:107) ~[netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:145) ~[netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:75) [netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700) [netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635) [netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552) [netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514) [netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044) [netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.42.Final.jar:4.1.42.Final]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_251]
   ```
   
   
   
   
   
   **Desktop (please complete the following information):**
    - OS: centos 7
    - 0.12
   
   
   **Additional context**
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679970631


   @wuyunfeng finally i find-it. it's nginx.
   nginx hold a lot of close_wait tcp connections,  that makes fe's fd count hit that max open-file-count 65535. i changed the nginx config, and waiting check


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-678069116


   may it possible when be dead, FE hold conections, so there are a lot of dead conections not to be release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wuyunfeng commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679016371


   Can you provide the abstract content of `ll /proc/{fe pid}/fd`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] francisoliverlee closed issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
francisoliverlee closed issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-678066400


   `java.io.IOException: 打开的文件过多`
   
   Looks like encounter `too many open files`? Have you deploy FE and BE on same host?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wuyunfeng closed issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
wuyunfeng closed issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679853250


   ll /proc/26123/fd
   count is 65535, and
   
   netstat -antp | grep 9030
   there are a 300+ other fe instance connections but CLOSE_WAIT.  but there are 6 fe


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wuyunfeng commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679742312


   @francisoliverlee  emm, I would close this issue. if you have same issue with this, feel free to reopen this issue. 
   The next time  you can provide some statistical information about the opened `fd`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wuyunfeng commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-680109509


   @francisoliverlee That's nice. Maybe you can set smaller value for keepalive_timeout? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-680412782


   > @francisoliverlee That's nice. Maybe you can set smaller value for keepalive_timeout?
   
   ok, i will try it and tks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-678066662


   > `java.io.IOException: 打开的文件过多`
   > 
   > Looks like encounter `too many open files`? Have you deploy FE and BE on same host?
   
   no


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wuyunfeng commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679886696


   the rest about `65535 - 300+` is what type `fd`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679896887


   socket


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead

Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679449556


   > Can you provide the abstract content of `ll /proc/{fe pid}/fd`
   
   so long that we had restarted the cluster to recover


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org