You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/08/21 06:21:15 UTC
[GitHub] [incubator-doris] francisoliverlee opened a new issue #4414: [FE] fe out of service when some be dead
francisoliverlee opened a new issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414
**Describe the bug**
fe not alive on fe UI console when some be dead
**Expected behavior**
FEs run OK, be dead
**Screenshots**
FE log
```java
2020-08-21 02:05:41,172 WARN 35 [TabletStatMgr.runAfterCatalogReady():69] task exec error. backend[10007]
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 拒绝连接 (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:226) ~[libthrift-0.9.3.jar:0.9.3]
at org.apache.doris.common.GenericPool$ThriftClientFactory.create(GenericPool.java:128) ~[palo-fe.jar:?]
at org.apache.doris.common.GenericPool$ThriftClientFactory.create(GenericPool.java:113) ~[palo-fe.jar:?]
at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactory.java:62) ~[commons-pool2-2.2.jar:2.2]
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1012) ~[commons-pool2-2.2.jar:2.2]
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:356) ~[commons-pool2-2.2.jar:2.2]
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:277) ~[commons-pool2-2.2.jar:2.2]
at org.apache.doris.common.GenericPool.borrowObject(GenericPool.java:85) ~[palo-fe.jar:?]
at org.apache.doris.catalog.TabletStatMgr.runAfterCatalogReady(TabletStatMgr.java:61) [palo-fe.jar:?]
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:?]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:?]
Caused by: java.net.ConnectException: 拒绝连接 (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_251]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_251]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_251]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_251]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_251]
at java.net.Socket.connect(Socket.java:606) ~[?:1.8.0_251]
at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ~[libthrift-0.9.3.jar:0.9.3]
... 10 more
2020-08-21 02:05:40,284 WARN 73 [BDBEnvironment.openDatabase():255] get exception when try to close previously opened bdb database. ignore it
com.sleepycat.je.rep.DatabasePreemptedException: (JE 7.3.7) (JE 7.3.7) Database 4538523 has been forcibly closed in order to apply a replicated remove operation. This Database and all associated Cursors must be closed. All associated Transactions must be aborted.
at com.sleepycat.je.rep.DatabasePreemptedException.wrapSelf(DatabasePreemptedException.java:113) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.checkOpen(Database.java:2274) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.getDatabaseName(Database.java:2046) ~[je-7.3.7.jar:7.3.7]
at org.apache.doris.journal.bdbje.BDBEnvironment.openDatabase(BDBEnvironment.java:231) [palo-fe.jar:?]
at org.apache.doris.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:269) [palo-fe.jar:?]
at org.apache.doris.persist.EditLog.getMaxJournalId(EditLog.java:94) [palo-fe.jar:?]
at org.apache.doris.catalog.Catalog.getMaxJournalId(Catalog.java:4661) [palo-fe.jar:?]
at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2385) [palo-fe.jar:?]
at org.apache.doris.catalog.Catalog$3.runOneCycle(Catalog.java:2190) [palo-fe.jar:?]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:?]
Caused by: com.sleepycat.je.rep.DatabasePreemptedException: (JE 7.3.7) Database 4538523 has been forcibly closed in order to apply a replicated remove operation. This Database and all associated Cursors must be closed. All associated Transactions must be aborted.
at com.sleepycat.je.rep.impl.RepImpl.createDatabasePreemptedException(RepImpl.java:2008) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.createDatabasePreemptedException(RepImpl.java:143) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.setPreempted(Database.java:469) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.DbInternal.setPreempted(DbInternal.java:58) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.DbTree.lockNameLN(DbTree.java:972) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.DbTree.doRemoveDb(DbTree.java:1172) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.dbi.DbTree.removeReplicaDb(DbTree.java:1239) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.node.Replay.applyNameLN(Replay.java:872) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.node.Replay.replayEntry(Replay.java:598) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.node.Replica$ReplayThread.run(Replica.java:1213) ~[je-7.3.7.jar:7.3.7]
2020-08-21 02:05:01,916 WARN 91 [DefaultChannelPipeline.onUnhandledInboundException():1164] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.io.IOException: 打开的文件过多
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[?:1.8.0_251]
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) ~[?:1.8.0_251]
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) ~[?:1.8.0_251]
at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:110) ~[netty-all-4.1.42.Final.jar:4.1.42.Final]
at io.netty.util.internal.SocketUtils$5.run(SocketUtils.java:107) ~[netty-all-4.1.42.Final.jar:4.1.42.Final]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_251]
at io.netty.util.internal.SocketUtils.accept(SocketUtils.java:107) ~[netty-all-4.1.42.Final.jar:4.1.42.Final]
at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:145) ~[netty-all-4.1.42.Final.jar:4.1.42.Final]
at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:75) [netty-all-4.1.42.Final.jar:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700) [netty-all-4.1.42.Final.jar:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635) [netty-all-4.1.42.Final.jar:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552) [netty-all-4.1.42.Final.jar:4.1.42.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514) [netty-all-4.1.42.Final.jar:4.1.42.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044) [netty-all-4.1.42.Final.jar:4.1.42.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.42.Final.jar:4.1.42.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.42.Final.jar:4.1.42.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_251]
```
**Desktop (please complete the following information):**
- OS: centos 7
- 0.12
**Additional context**
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679970631
@wuyunfeng finally i find-it. it's nginx.
nginx hold a lot of close_wait tcp connections, that makes fe's fd count hit that max open-file-count 65535. i changed the nginx config, and waiting check
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-678069116
may it possible when be dead, FE hold conections, so there are a lot of dead conections not to be release.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] wuyunfeng commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679016371
Can you provide the abstract content of `ll /proc/{fe pid}/fd`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] francisoliverlee closed issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
francisoliverlee closed issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] morningman commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
morningman commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-678066400
`java.io.IOException: 打开的文件过多`
Looks like encounter `too many open files`? Have you deploy FE and BE on same host?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] wuyunfeng closed issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
wuyunfeng closed issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679853250
ll /proc/26123/fd
count is 65535, and
netstat -antp | grep 9030
there are a 300+ other fe instance connections but CLOSE_WAIT. but there are 6 fe
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] wuyunfeng commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679742312
@francisoliverlee emm, I would close this issue. if you have same issue with this, feel free to reopen this issue.
The next time you can provide some statistical information about the opened `fd`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] wuyunfeng commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-680109509
@francisoliverlee That's nice. Maybe you can set smaller value for keepalive_timeout?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-680412782
> @francisoliverlee That's nice. Maybe you can set smaller value for keepalive_timeout?
ok, i will try it and tks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-678066662
> `java.io.IOException: 打开的文件过多`
>
> Looks like encounter `too many open files`? Have you deploy FE and BE on same host?
no
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] wuyunfeng commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679886696
the rest about `65535 - 300+` is what type `fd`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679896887
socket
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] francisoliverlee commented on issue #4414: [FE] fe out of service when some be dead
Posted by GitBox <gi...@apache.org>.
francisoliverlee commented on issue #4414:
URL: https://github.com/apache/incubator-doris/issues/4414#issuecomment-679449556
> Can you provide the abstract content of `ll /proc/{fe pid}/fd`
so long that we had restarted the cluster to recover
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org