You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/09/21 00:51:59 UTC
[GitHub] [druid] gianm commented on issue #12904: Promote druid-kubernetes-extensions out of experimental status

gianm commented on issue #12904:
URL: https://github.com/apache/druid/issues/12904#issuecomment-1253063416

   Slack thread mentioning an issue: https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1663715405113769. Reproducing some info here.
   
   > Since switch to using the Kubernetes extension instead of Zookeeper, I have been seeing an issue and I am curious if anyone else has seen it.  We are running 0.23.0 with indexers instead of middlemanagers.  When an indexer pod goes away, we will begin seeing errors like the following in the coordinator logs (stack trace and details in thread)
   
   ```
   {
     "level": "ERROR",
     "thread": "HttpServerInventoryView-4",
     "message": "failed to get sync response from [http://10.4.132.249:8091/_1663714827177]. Return code [0], Reason: [null]",
     "exception": {
       "exception_class": "org.jboss.netty.channel.ChannelException",
       "exception_message": "Faulty channel in resource pool",
       "stacktrace": "org.jboss.netty.channel.ChannelException: Faulty channel in resource pool\n\tat org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:131)\n\tat org.apache.druid.server.coordination.ChangeRequestHttpSyncer.sync(ChangeRequestHttpSyncer.java:218)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: /10.4.132.249:8091\n\tat org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectT
 imeout(NioClientBoss.java:139)\n\tat org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)\n\tat org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)\n\tat org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)\n\tat org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)\n\tat org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)\n\t... 3 more\n"
     },
     "hostName": "storage--druid-coordinator-8454fd4cf5-zz94r"
   }
   ```
   
   
   ```
   org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
     at org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:131)
     at org.apache.druid.server.coordination.ChangeRequestHttpSyncer.sync(ChangeRequestHttpSyncer.java:218)
     at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
     at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
     at java.base/java.lang.Thread.run(Thread.java:829)
     Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: /10.4.132.249:8091
     at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139)
     at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
     at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
     at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
     at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
     at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
     ... 3 more\n
   ```
   
   > It appears that once it gets into this state it will continue to retry indefinitely, and eventually the coordinator becomes bogged down and non-responsive


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org