You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/06/14 09:11:06 UTC

Slack digest for #general - 2020-06-14

2020-06-13 12:32:58 UTC - megachucky: @megachucky has joined the channel
----
2020-06-13 13:01:32 UTC - Sankararao Routhu: <!here> do we need to restart proxies after restarting brokers? My publisher connections are failing through proxy when brokers are restarted unless I restart my proxies
----
2020-06-13 17:01:51 UTC - Asaf Mesika: Can’t you get lag count per topic?
----
2020-06-13 17:03:34 UTC - Gilles Barbier: lag count?
----
2020-06-13 17:04:37 UTC - Asaf Mesika: In Kafka for example per consumer you can get the amount of records which haven’t been consumed yet 
----
2020-06-13 17:05:01 UTC - Asaf Mesika: I presumed it is the same in Pular
----
2020-06-13 17:07:03 UTC - Gilles Barbier: Not sure about that - but we obtain counters for each job status using pulsar functions and counters (<https://pulsar.apache.org/docs/en/2.5.2/functions-develop/#api>)
----
2020-06-13 17:19:50 UTC - Asaf Mesika: But pulsar functions have different subscription and rate of consumption compared to job workers no?
----
2020-06-13 17:28:57 UTC - Gilles Barbier: In our case, , a "dispatchJob" message is handled by a function before to be sent to workers. This function maintains a state describing each job processing. Workers are sending back a status to this function also. It's more than a simple task queue
----
2020-06-13 18:48:53 UTC - Rutvij: @Rutvij has joined the channel
----
2020-06-13 20:38:37 UTC - Marcio Martins: Anyone knows what would cause this?
```20:28:26.093 [BookKeeperClientScheduler-OrderedScheduler-0-0] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Unexpected exception while handling joining bookie bookie-1.bookie.pulsar.svc.cluster.local:3181
java.lang.NullPointerException: null
	at org.apache.bookkeeper.net.NetUtils.resolveNetworkLocation(NetUtils.java:77) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.resolveNetworkLocation(TopologyAwareEnsemblePlacementPolicy.java:779) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.createBookieNode(TopologyAwareEnsemblePlacementPolicy.java:775) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.handleBookiesThatJoined(TopologyAwareEnsemblePlacementPolicy.java:707) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.handleBookiesThatJoined(RackawareEnsemblePlacementPolicyImpl.java:79) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.handleBookiesThatJoined(RackawareEnsemblePlacementPolicy.java:246) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.onClusterChanged(TopologyAwareEnsemblePlacementPolicy.java:654) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.onClusterChanged(RackawareEnsemblePlacementPolicyImpl.java:79) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.onClusterChanged(RackawareEnsemblePlacementPolicy.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.client.BookieWatcherImpl.processWritableBookiesChanged(BookieWatcherImpl.java:171) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.client.BookieWatcherImpl.lambda$initialBlockingBookieRead$1(BookieWatcherImpl.java:206) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.discover.ZKRegistrationClient$WatchTask.accept(ZKRegistrationClient.java:139) [org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.discover.ZKRegistrationClient$WatchTask.accept(ZKRegistrationClient.java:62) [org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) [?:1.8.0_242]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) [?:1.8.0_242]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) [?:1.8.0_242]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_242]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_242]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_242]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_242]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]```

----
2020-06-13 20:39:32 UTC - Marcio Martins: Before this, I had official zookeeper and bookeeper clusters running with no issues - I switched to the `pulsar-all` images and am now getting this...
----
2020-06-13 20:49:23 UTC - Anup Ghatage: @Marcio Martins
Looks like we’re failing to resolve the hostname for the BookieSocketAddress of `bookie-1.bookie.pulsar.svc.cluster.local:3181`
If we dig deeper, we see that this `null` address is coming from:
```TopologyAwareEnsemblePlacementPolicy.java
L:649 joinedBookies = Sets.difference(writableBookies, oldBookieSet).immutableCopy();```
Guavas Sets.difference only returns items which are present in set1 but not in set2 and if you’re getting null, it means both sets are the same.

You might just have all of the bookies in read only mode and none in read-write mode.

If you have a moment we can side-bar and I can show you how to check if that’s the case.

Still NPE is something that is *not* expected. @Sijie Guo do you recommend we open a bug on this one?
----
2020-06-13 20:53:01 UTC - Marcio Martins: Yes, any help would be great!
----
2020-06-13 20:53:13 UTC - Marcio Martins: Thank you!
----
2020-06-13 20:53:37 UTC - Anup Ghatage: Can you please log into any bookie node and execute this shell command:

`bin/bookkeeper shell listbookies -h -rw`
----
2020-06-13 20:54:52 UTC - Marcio Martins: I get Fail to process command 'list'
----
2020-06-13 20:55:12 UTC - Marcio Martins: ```20:54:40.880 [main-EventThread] INFO  org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is connected now.
ReadWrite Bookies :
20:54:41.022 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x300015d235e000b```

----
2020-06-13 20:55:41 UTC - Marcio Martins: Seems like there are no read-write bookies
----
2020-06-13 20:55:41 UTC - Anup Ghatage: And now try this:
`bin/bookkeeper shell listbookies -h -ro`
----
2020-06-13 20:55:58 UTC - Marcio Martins: No bookie exists!
----
2020-06-13 20:56:15 UTC - Anup Ghatage: Hmm just as I expected. The BookKeeper deployment has gone wrong
----
2020-06-13 20:56:41 UTC - Marcio Martins: All bookies are connected to zookeeper
----
2020-06-13 20:57:03 UTC - Marcio Martins: and I initialized the cluster with `initnewcluster`
----
2020-06-13 20:57:14 UTC - Marcio Martins: I think on the old deployment script I was using `shell metaformat`
----
2020-06-13 20:57:18 UTC - Marcio Martins: Could that be it?
----
2020-06-13 20:57:20 UTC - Anup Ghatage: So just to be clear, you are seeing nothing in read-write or read-only bookies
----
2020-06-13 20:57:31 UTC - Marcio Martins: yes, they are both empty
----
2020-06-13 20:57:51 UTC - Anup Ghatage: &gt;  I think on the old deployment script I was using `shell metaformat`
Not sure what the old deployment was doing.
----
2020-06-13 21:00:05 UTC - Marcio Martins: So this is the cluster initialization before the bookies were started for the first time:
```20:47:06.286 [main] INFO  org.apache.bookkeeper.discover.ZKRegistrationManager - Successfully initiated cluster. ZKServers: zookeeper-0.zookeeper.pulsar.svc.cluster.local:2181,zookeeper-1.zookeeper.pulsar.svc.cluster.local:2181,zookeeper-2.zookeeper.pulsar.svc.cluster.local:2181 ledger root path: /ledgers instanceId: d522dd92-aea6-4d29-a3b2-2101b0ef754d
20:47:06.390 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100015d27a00000
20:47:06.390 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x100015d27a00000 closed```

----
2020-06-13 21:00:49 UTC - Anup Ghatage: That looks fine
----
2020-06-13 21:01:05 UTC - Anup Ghatage: @Marcio Martins
Perhaps best start a thread on the dev@ mailing list.
If you’ve done everything vanilla and this still happens, might be worth updating the documentation for the changes required the way we’re deploying.
----
2020-06-13 21:01:10 UTC - Marcio Martins: The bookies see eachother:
```20:49:34.489 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /default-rack/bookie-1.bookie.pulsar.svc.cluster.local:3181
20:49:34.494 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /default-rack/bookie-2.bookie.pulsar.svc.cluster.local:3181```

----
2020-06-13 21:02:04 UTC - Marcio Martins: No, I didn't do everything vanilla, I adapted the helm charts in the repo, but I am not sure what went wrong, I think I got everything correct...
----
2020-06-13 21:46:06 UTC - Anup Ghatage: Can you also try the `listcookies` command? Let's check if the bookies are registered at least
----
2020-06-14 02:55:12 UTC - Liam Clarke: Do you have
```brokerDeleteInactiveTopicsEnabled=false```
configured?
----
2020-06-14 02:58:11 UTC - Liam Clarke: Hi all, I'm testing BK's autorecovery, and it's not working as I'd assume, and I'd be appreciative of any guidance.

I have 3 bookies running with Docker-compose, and I've configured the namespace accordingly:

```bin/pulsar-admin namespaces set-persistence test-tenant/test-namespace \
                            --bookkeeper-ensemble 2 \
                            --bookkeeper-ack-quorum 1 \
                            --bookkeeper-write-quorum 1 \
                            --ml-mark-delete-max-rate 0 ```
I created a topic and fired some data at it, and obtained the ledgerId:

```./pulsar-admin topics info-internal test-tenant/test-namespace/example-topic
{
  "version": 1,
  "creationDate": "2020-06-14T02:45:08.961Z",
  "modificationDate": "2020-06-14T02:45:09.055Z",
  "ledgers": [
    {
      "ledgerId": 42
    }
  ],
  "cursors": {}
}```
I identified the Bookies in the ledger's ensemble:

```./bookkeeper shell ledgermetadata -ledgerid 42
ledgerID: 42
LedgerMetadata{formatVersion=3, ensembleSize=2, writeQuorumSize=1, ackQuorumSize=1, state=OPEN, digestType=CRC32C, password=base64:, ensembles={0=[172.21.0.5:3181, 172.21.0.4:3181]}, customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:dGVzdC10ZW5hbnQvdGVzdC1uYW1lc3BhY2UvcGVyc2lzdGVudC9leGFtcGxlLXRvcGlj, application=base64:cHVsc2Fy}}```
I then `docker-compose kill`  one of the bookies in the ensemble (in this case, 172.21.0.5).

The autorecovery auditor knows it's underreplicated:

```./bookkeeper shell listunderreplicated
42
        Ctime : 1592103060751```
But the behaviour I'm expecting - that it creates a new replica on the bookie not previously part of the ensemble, isn't occurring. In the logs I see this:

```bookie2      | 2020-06-14 03:04:19,401 - ERROR - [bookkeeper-io-14-10:PerChannelBookieClient$ConnectionFutureListener@2454] - Could not connect to bookie: [id: 0x0bec2643]/172.21.0.4:3181, current state CONNECTING : 
...
bookie2      | Caused by: java.net.NoRouteToHostException: No route to host
bookie2      |  ... 11 more
bookie2      | 2020-06-14 03:04:19,402 - ERROR - [BookKeeperClientWorker-OrderedExecutor-10-0:ReadLastConfirmedOp@141] - While readLastConfirmed ledger: 42 did not hear success responses from all quorums
bookie2      | 2020-06-14 03:04:19,402 - INFO  - [ReplicationWorker:ReplicationWorker@290] - BKReadException while rereplicating ledger 42. Enough Bookies might not have available So, no harm to continue```

Does the last log message mean that the replica on 172.21.0.4 wasn't up to date enough to replicate from?
----
2020-06-14 03:05:00 UTC - Anup Ghatage: Hi @Liam Clarke,
What is the problem you’re seeing exactly?
I assume its that auto recovery is not replicating?
Couple of things you could try here:
• Have you tried with a higher ensemble and quorum numbers? (Try 3,3,3 / 3,2,2)
• Use bookie shell to check the underreplicated ledgers when you’re expecting them to replicate
Lets side-bar if you have more questions.
----
2020-06-14 03:06:20 UTC - Liam Clarke: Thanks for the reply @Anup Ghatage I accidentally sent the message before putting all the details in :slightly_smiling_face: Please see my edit above.
----
2020-06-14 03:06:35 UTC - Anup Ghatage: Sure, going through it right now.
----
2020-06-14 03:07:59 UTC - Liam Clarke: I'm guessing that the lower ack / write quorums meant that the data on the remaining ensemble member wasn't up to date when I killed the bookie?
----
2020-06-14 03:10:02 UTC - Anup Ghatage: Yeah looks like it.
Can you try with higher quorum numbers?
Try 3/3/3 (E/W/A) assuming you have more booked deployed.
----
2020-06-14 03:16:35 UTC - Anup Ghatage: @Liam Clarke You can also force the auto replication to happen via the `triggeraudit`  command
----
2020-06-14 03:29:10 UTC - Liam Clarke: Okay with 3/3/3 I get a `Failure NotEnoughBookiesException: Not enough non-faulty bookies available while writing entry: 3000 while recovering ledger: 18`

Which makes sense now that I've only got two bookies.

I tried again with 2/2/2, so that after killing 1 of the 3 nodes it still has enough to create and ensembled and it's still not working as I expect.

When running `bookkeeper shell recover -l &lt;ledgerid&gt; &lt;bookieid&gt;` It tries repeatedly (and fails) to connect to the failed bookie.
----
2020-06-14 03:35:16 UTC - Liam Clarke: Wait, I tell a lie, it's recovered now.
----
2020-06-14 03:35:34 UTC - Anup Ghatage: You owe me a beer :stuck_out_tongue_closed_eyes:
----
2020-06-14 03:50:51 UTC - Liam Clarke: Haha, thank you for your help, happily buy you one the next time you're in NZ.

Ahhhhh might have been because the ledger was still open, according to the docs:

&gt;  If the replication worker finds a fragment which needs rereplication, but does not have a defined endpoint (i.e. the final fragment of a ledger currently being written to), it will wait for a grace period before attempting rereplication
Prior to it recovering, the ledger was still open, according to the metadata at least.
+1 : Anup Ghatage
----
2020-06-14 04:52:28 UTC - Sijie Guo: you don’t need to that. Did you see any errors in the proxy log?
----
2020-06-14 05:13:27 UTC - Sankararao Routhu: Hi @Sijie Guo thank for replying..I see following error in proxy logs
----
2020-06-14 05:13:29 UTC - Sankararao Routhu: ```2020-06-13 22:12:05,560 -0700 [pulsar-proxy-io-2-4] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/35.167.191.252:57232] Connection closed
2020-06-13 22:12:05,602 -0700 [pulsar-proxy-io-2-6] WARN  org.apache.pulsar.proxy.server.LookupProxyHandler - [/103.15.250.25:35321] Failed to get next active broker No active broker is available
org.apache.pulsar.broker.PulsarServerException: No active broker is available
	at org.apache.pulsar.proxy.server.BrokerDiscoveryProvider.nextBroker(BrokerDiscoveryProvider.java:94) ~[org.apache.pulsar-pulsar-proxy-2.5.0.jar:2.5.0]
	at org.apache.pulsar.proxy.server.LookupProxyHandler.handleLookup(LookupProxyHandler.java:106) [org.apache.pulsar-pulsar-proxy-2.5.0.jar:2.5.0]
	at org.apache.pulsar.proxy.server.ProxyConnection.handleLookup(ProxyConnection.java:387) [org.apache.pulsar-pulsar-proxy-2.5.0.jar:2.5.0]
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:126) [org.apache.pulsar-pulsar-common-2.5.0.jar:2.5.0]
	at org.apache.pulsar.proxy.server.ProxyConnection.channelRead(ProxyConnection.java:174) [org.apache.pulsar-pulsar-proxy-2.5.0.jar:2.5.0]```
----
2020-06-14 05:13:55 UTC - Sankararao Routhu: But my broker is active as I restarted broker
----
2020-06-14 05:15:30 UTC - Sankararao Routhu: Here is the complete stack trace @Sijie Guo
----
2020-06-14 05:15:32 UTC - Sankararao Routhu: ```2020-06-13 22:12:05,560 -0700 [pulsar-proxy-io-2-4] INFO  org.apache.pulsar.proxy.server.ProxyConnection - [/35.167.191.252:57232] Connection closed
2020-06-13 22:12:05,602 -0700 [pulsar-proxy-io-2-6] WARN  org.apache.pulsar.proxy.server.LookupProxyHandler - [/103.15.250.25:35321] Failed to get next active broker No active broker is available
org.apache.pulsar.broker.PulsarServerException: No active broker is available
	at org.apache.pulsar.proxy.server.BrokerDiscoveryProvider.nextBroker(BrokerDiscoveryProvider.java:94) ~[org.apache.pulsar-pulsar-proxy-2.5.0.jar:2.5.0]
	at org.apache.pulsar.proxy.server.LookupProxyHandler.handleLookup(LookupProxyHandler.java:106) [org.apache.pulsar-pulsar-proxy-2.5.0.jar:2.5.0]
	at org.apache.pulsar.proxy.server.ProxyConnection.handleLookup(ProxyConnection.java:387) [org.apache.pulsar-pulsar-proxy-2.5.0.jar:2.5.0]
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:126) [org.apache.pulsar-pulsar-common-2.5.0.jar:2.5.0]
	at org.apache.pulsar.proxy.server.ProxyConnection.channelRead(ProxyConnection.java:174) [org.apache.pulsar-pulsar-proxy-2.5.0.jar:2.5.0]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:326) [io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:300) [io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1478) [io.netty-netty-handler-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1239) [io.netty-netty-handler-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1276) [io.netty-netty-handler-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:503) [io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442) [io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:281) [io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]```
----
2020-06-14 05:16:37 UTC - Sankararao Routhu: We are using zookeeper for service discovery from proxy
----
2020-06-14 05:21:02 UTC - Sijie Guo: If you are running k8s, I usually recommend using brokerServiceURL for service discovery instead of zookeeper discovery.
----
2020-06-14 05:21:37 UTC - Sijie Guo: because of session expires, the zookeeper discovery can be a problem.
----
2020-06-14 05:26:40 UTC - Sankararao Routhu: We are not running in K8s @Sijie Guo
----
2020-06-14 05:27:10 UTC - Sankararao Routhu: is it failing because we are using zookeeper service discovery?
----
2020-06-14 05:27:50 UTC - Sankararao Routhu: We had a challenge in using broker service Url so switched to service discovery
----
2020-06-14 05:29:42 UTC - Sankararao Routhu: We have multiple broker aws instances which are behind nlb. Proxy was not able to connect to broker nlb
----
2020-06-14 05:30:29 UTC - Sankararao Routhu: Proxy is not able to connect after restarting brokers because of zookeeper service discovery @Sijie Guo?
----
2020-06-14 05:37:40 UTC - Sijie Guo: it seems that after restarting brokers, the broker cache in proxies became empty. hence proxies are not able to find any brokers to connect.

If you have setup a nlb, is that nlb an internal LB or a public LB? Proxy just need to connect to the nlb
----
2020-06-14 05:41:19 UTC - Sankararao Routhu: its public nlb
----
2020-06-14 05:41:37 UTC - Sankararao Routhu: if I use nlb in proxy then I am getting following error
----
2020-06-14 05:41:40 UTC - Sankararao Routhu: ```2020-06-13 22:39:19,494 -0700 [pulsar-proxy-io-2-4] WARN  org.apache.pulsar.proxy.server.LookupProxyHandler - [<persistent://identity/idm/failovertest2.Queue>] failed to get Partitioned metadata : org.apache.pulsar.client.api.PulsarClientException: Connection already closed
java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$LookupException: org.apache.pulsar.client.api.PulsarClientException: Connection already closed
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) [?:1.8.0_181]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) [?:1.8.0_181]
	at org.apache.pulsar.client.impl.ClientCnx.handlePartitionResponse(ClientCnx.java:505) [org.apache.pulsar-pulsar-client-original-2.5.0.jar:2.5.0]
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:120) [org.apache.pulsar-pulsar-common-2.5.0.jar:2.5.0]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:326) [io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:300) [io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1478) [io.netty-netty-handler-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1239) [io.netty-netty-handler-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1276) [io.netty-netty-handler-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:503) [io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442) [io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:281) [io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: org.apache.pulsar.client.api.PulsarClientException$LookupException: org.apache.pulsar.client.api.PulsarClientException: Connection already closed
	at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:987) ~[org.apache.pulsar-pulsar-client-original-2.5.0.jar:2.5.0]
	at org.apache.pulsar.client.impl.ClientCnx.handlePartitionResponse(ClientCnx.java:506) ~[org.apache.pulsar-pulsar-client-original-2.5.0.jar:2.5.0]
	... 31 more
2020-06-13 22:39:19,495 -0700 [pulsar-client-shutdown-thread] INFO  org.apache.pulsar.proxy.server.ProxyConnectionPool - Closing ProxyConnectionPool.```
----
2020-06-14 05:42:44 UTC - Sankararao Routhu: Proxy is abled to get Partitioned metadata
----
2020-06-14 05:42:50 UTC - Sankararao Routhu: @Sijie Guo
----
2020-06-14 05:44:23 UTC - Sankararao Routhu: I have commented out zookeeperServers and configurationStoreServers and provided brokerServiceURLTLS, brokerWebServiceURLTLS in proxy.conf
----
2020-06-14 05:59:28 UTC - Sankararao Routhu: Hi @Sijie Guo
----
2020-06-14 05:59:49 UTC - Sankararao Routhu: can you please let me know if above config is correct
----
2020-06-14 06:39:46 UTC - Ali Ahmed: <https://github.com/debezium/debezium/pull/1538/>
----
2020-06-14 07:18:06 UTC - Asaf Mesika: Say I have a topic containing tasks to do, and a shared subscription, with multiple worker machines using it to execute those tasks. How can I know the state of the subscription in terms of number of unacknowledged messages? (i.e. size of the queue)
----