You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/06/24 08:47:44 UTC

[GitHub] [pulsar] andrekramer1 opened a new issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

andrekramer1 opened a new issue #11070:
URL: https://github.com/apache/pulsar/issues/11070


   **Describe the bug**
   Running the Kubernetes Helm chart with a 2.8.0 pulsar image, Zookeeper pod 0 runs but fails repeatedly. The rest of the Pulsar cluster is waiting on Zookeeper initialization. The Zookeeper health check is not passing "echo ruok | nc localhost 2181" is not returning "imok" - just hanging I think. So the kubernetes health check is failing. Zookeeper is also compling about not resolving zookeeper-1 and zookeeper-2 as usual but the zookeeper-0 log have a new NullReferenceException that we've not seen before:
   
   10:58:55.990 [epollEventLoopGroup-4-2] WARN  org.apache.zookeeper.server.NettyServerCnxnFactory - Exception caught
   java.lang.NullPointerException: null
                  at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.channelActive(NettyServerCnxnFactory.java:258) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
                  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:209) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:522) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [io.netty-netty-transport-native-epoll-4.1.63.Final-linux-x86_64.jar:4.1.63.Final]
                  at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
                  at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
                  at java.lang.Thread.run(Thread.java:829) [?:?
   
   **To Reproduce**
   Pulsar 2.8.0 docker image on Kubernetes as deployed by Helm chart. Changing just the zookeeper image to 2.7.0 and the cluster came up.
   
   **Additional context**
   
   Possibly java 11 is part of the problem or just a bad version of Zookeeper?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-937264170


   Good catch!
   
   Can you please follow up on Zookeeper project?
   If there is a problem on Zookeeper we must fix it
   I will be happy to help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-936457985


   Some further investigation found that the first Zookeeper created for the Kubernetes statefull set does not respond to the ready/liveness probe. This uses the "ruok" command and the reply from the server is to close the connection (as Zookeeper is not up and running). So the second and third replicas are never created. Somehow Zookeeper has stopped responding while initializing / creating a quorum. This can be confirmed by setting the enabled flag on Zookeeper ready and liveness probes to false in the helm chart. With probes disabled managed to initialize a 3 node cluster.
   
   Created a debug branch of Zookeeper modified to respond to ruok and other client requests even when not fully initialized. With these changes it's also possible to bring up Zookeepers and Pulsar cluster with the probes enabled. The branch is here: https://github.com/andrekramer1/zookeeper/tree/early-ruok 
   
   Would be possible to create a pull request from this but the implications of allowing client connections while Zookeeper is initializing would need to be considered. Hopefully the change list can help fix this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-937668177


   @eolivelli I created a PR https://github.com/apache/zookeeper/pull/1770 over on Zookeeper github to start the process of looking for a fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
merlimat commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-936539979


   @andrekramer1 The problem is related with a bug in ZooKeeper when running with NettyServerFactory. It is getting enabled in the Helm chart (https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/templates/zookeeper-configmap.yaml#L33). We should revert the helm chart to use the default Zookeeper server factory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli edited a comment on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli edited a comment on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-898470922


   Possibly the chart you tested did the same?
   I used that Helm Chart by overriding the version of the docker image for all the components (zookeeper, broker, bookie...)
   And I had verified that the version of ZK and Pulsar was good


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jmhublar commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
jmhublar commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-897097533


   Yeah, this is not working at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-936676027


   @merlimat @eolivelli A quick test confirmed that the NIO server factory will work. Note, only tested that Pulsar cluster can be created - not that it works/performs under any load.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 edited a comment on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 edited a comment on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-898469782


   I see the diff you point to above explicitly sets Zookeeper to 2.7.2 until this issue here is fixed:
   '''
   images:
     # bump to 2.8 after https://github.com/apache/bookkeeper/pull/2740 is fixed
     zookeeper:
       repository: apachepulsar/pulsar-all
       tag: 2.7.2
       pullPolicy: IfNotPresent
     bookie:
       repository: apachepulsar/pulsar-all
       tag: 2.7.2
       pullPolicy: IfNotPresent 
   '''
   Possibly the chart you tested did the same?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-897759754


   I have run my tests and with this helm chart ZooKeeper works well with 3 pods
   https://github.com/datastax/pulsar-helm-chart


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-898464140


   Our chart does not have SSL enabled and I tried adding:
   PULSAR_EXTRA_OPTS: >
         -Dzookeeper.ssl.hostnameVerification=false
         -Dzookeeper.ssl.quorum.hostnameVerification=false
   But still failing on 3 node cluster. It's the same null reference exception. Possibly some startup race condition rather than SSL configuration?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-915133204


   The problem is in the helm chart, not in Pulsar itself


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-915181724


   @eolivelli when you say the problem is "in the helm chart" what exactly do you think is the issue? As far as I could tell it's a timing issue in start 3 Zookeepers when the cluster is created. It may work with 1 Zookeeper and the only smoking gun I saw was the Null reference exception in the zookeeper kubernetes pod logs. 2.8.0 uses Zookeeper 3.6.3 and won't start but using 2.7.0 which has Zookeeper 3.5.7 (only for Zookeeper pods) starts reliably.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-1022396365


   @lhotari sounds reasonable that it would help. Still running 1.16 for dev locally and the issue is being fixed in Zookeeper under: https://github.com/apache/zookeeper/pull/1800


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-1022179321


   @andrekramer1 do you have a chance to validate whether https://github.com/apache/pulsar-helm-chart/pull/214 changes would resolve the issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-937668177


   @eolivelli I created a PR https://github.com/apache/zookeeper/pull/1770 over on Zookeeper github to start the process of looking for a fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-897683866


   which helm chart are you using ?
   
   the "official" Apache Pulsar Helm chart was never updated to Pulsar 2.8.0
   see this PR that has not been merged
   https://github.com/apache/pulsar-helm-chart/pull/130
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-979507609


   Zookeeper issue is https://issues.apache.org/jira/browse/ZOOKEEPER-3988


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui closed issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
codelipenghui closed issue #11070:
URL: https://github.com/apache/pulsar/issues/11070


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-1017381692


   @andrekramer1 I created an alternative, more simpler PR that does not change the previous behaviour but fixes the NPE
   [1798](https://github.com/apache/zookeeper/pull/1798).
   
   Can you please take a look ?
   is it enough to fix your problem ?
   If the answer is "yes" that it is easier to commit the patch and see it released soon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] truonglenhut16111989 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
truonglenhut16111989 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-915017513


   I faced the same issue when I tried with 2.8.0. Can this bug be fixed in branch 2.8?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-897567591


   @hangc0276 I believe that we should fix this problem before cutting 2.8.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hangc0276 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-897676723


   Thanks for @eolivelli 's help, from the exception stack, i looks like something wrong in zookeeper or the helm chart. Would you please paste detail logs of zookeeper? @andrekramer1 @jmhublar @mnit016 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-898469782


   I see the diff you point to above explicitly set Zookeeper to 2.7.2 until this issue here is fixed:
   
   images:
     # bump to 2.8 after https://github.com/apache/bookkeeper/pull/2740 is fixed
     zookeeper:
       repository: apachepulsar/pulsar-all
       tag: 2.7.2
       pullPolicy: IfNotPresent
     bookie:
       repository: apachepulsar/pulsar-all
       tag: 2.7.2
       pullPolicy: IfNotPresent 
   
   Possibly the chart you tested did the same?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-937264170


   Good catch!
   
   Can you please follow up on Zookeeper project?
   If there is a problem on Zookeeper we must fix it
   I will be happy to help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-882307920


   A work around is using 2.7.x image for just Zookeeper.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-979505867


   I created this fix for the Apache Pulsar Helm Chart: https://github.com/apache/pulsar-helm-chart/pull/180


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-936601691


   @merlimat Yes, the fix I worked up is in the Netty server code. So probably would side step this with another server factory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] andrekramer1 edited a comment on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
andrekramer1 edited a comment on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-915181724


   @eolivelli when you say the problem is "in the helm chart" what exactly do you think is the issue? As far as I could tell it's a timing issue in starting 3 Zookeepers when the cluster is created. It may work with 1 Zookeeper and the only smoking gun I saw was the Null reference exception in the zookeeper kubernetes pod logs. 2.8.0 uses Zookeeper 3.6.3 and won't start but using 2.7.0 which has Zookeeper 3.5.7 (only for Zookeeper pods) starts reliably.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-897664698


   the NPE is here
   https://github.com/apache/zookeeper/blob/00071dd3532f47c16e0973cc5af03cccec7ba9ab/zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java#L258
   
   probably:
   - there is another error in the logs
   - the server is taking much time to come up


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-898470922


   > Possibly the chart you tested did the same?
   I used that Helm Chart by overriding the version of the docker image for all the components (zookeeper, broker, bookie...)
   And I had verified that the version of ZK and Pulsar was good


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-897690248


   I am testing it now with my helm chart....stay tuned


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] mnit016 commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
mnit016 commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-881169710


   I've faced the same issue here when upgrade to 2.8, is any update?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on issue #11070: Zookeeper in 3 node Kubernetes cluster does not pass heath check in 2.8.0

Posted by GitBox <gi...@apache.org>.
eolivelli commented on issue #11070:
URL: https://github.com/apache/pulsar/issues/11070#issuecomment-897695832


   @andrekramer1 this line seems relevant for upgrading ZK and Pulsar 2.8
   
   https://github.com/apache/pulsar-helm-chart/pull/130/files#diff-7b27cbfd585af4cb29a517c905cf423c776a5a0d376940b1397e81b8daffd2b7R42
   
   do you have TLS enabled ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org