You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Dave Fisher <wa...@comcast.net> on 2023/04/12 22:49:24 UTC

[BUGS] Pulsar 3.0 GKE Cluster Won't Start

Hi -

We do End to end testing of Pulsar versions on GCP and with Pulsar 3.0 today we have the following problem.

This image shows the broker deployments not coming up. This is a non-TLS setup.



The Broker Pods have errors like this one:
2023-04-12T20:10:35,995+0000 [broker-topic-workers-OrderedExecutor-3-0] ERROR org.apache.pulsar.common.protocol.Commands - [PersistentSubscription{topic= <>persistent://pulsar/pulsar/10.236.0.4:8080/healthcheck <persistent://pulsar/pulsar/10.236.0.4:8080/healthcheck>, name=healthCheck-04a8e22e-3e0b-4aa3-9141-d8970c1b0712}] [-1] Failed to parse message metadata
java.lang.IndexOutOfBoundsException: readerIndex(96) + length(2) exceeds writerIndex(96): UnpooledDuplicatedByteBuf(ridx: 96, widx: 96, cap: 96, unwrapped: CompositeByteBuf(ridx: 96, widx: 96, cap: 96, components=2))
	at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
	at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
	at org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
	at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
We looked into this but aren’t sure what would cause this to suddenly be an issue. The candidate would be differences in the new version of BK …

Best,
Dave

Re: [BUGS] Pulsar 3.0 GKE Cluster Won't Start

Posted by Hang Chen <ch...@apache.org>.
Hi guys,
    Thanks for Micheal's great job, and he already pushed one PR [1]
to fix this bug on the BookKeeper side. This comment [2] shows how
this bug happens.

I will cut BookKeeper 4.16.1 release soon.

Thanks,
Hang


[1] https://github.com/apache/bookkeeper/pull/3919
[2] https://github.com/apache/bookkeeper/pull/3919#issuecomment-1508220615

Michael Marshall <mm...@apache.org> 于2023年4月14日周五 07:32写道:
>
> I ended up digging a little longer. I have some partial results to share.
>
> It seems to me the problem was likely introduced by this Bookkeeper PR
> https://github.com/apache/bookkeeper/pull/3783.
>
> These are likely the problematic lines:
> https://github.com/apache/bookkeeper/blob/234b817cdb4e054887ffd5e42eaed25dc02daf63/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/checksum/DigestManager.java#L161-L167
>
> My current theory is that we're using a CompositeByteBuf when the
> brokerEntryMetadataInterceptors is in use. Because of that, the call
> in OpAddEntry [0] does not produce a deep copy of the readerIndex and
> the writerIndex. When pulsar passes the "duplicateBuffer" to the
> bookkeeper client, the internal buffer's readerIndex and writerIndex
> are both mutable state. Therefore, when we call the `writeBytes`
> method in the DigestManager, the buffer's readerIndex is moved
> forward. Interestingly, when the buffer is big enough, we rely on
> `ByteBufList` to coalesce the buffers and it appears the readerIndex
> is maintained in that case.
>
> That likely explains why we're seeing the readerIndex equal to the
> buffer's capacity.
>
> It'd be good to get a definitive test to confirm this theory. (Sorry,
> I would have tested myself, but I need to sign off now.)
>
> Thanks,
> Michael
>
> [0] https://github.com/apache/pulsar/blob/bafecb2a9c0e73942de6a38df72dd5888d5afd66/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java#L126
>
> On Thu, Apr 13, 2023 at 5:31 PM Michael Marshall <mm...@apache.org> wrote:
> >
> > By my testing using the above steps to reproduce the issue, I do not
> > see the bug on commit [0] but I do see it on the subsequent commit
> > [1].
> >
> > That indicates [1], which is the bookkeeper client upgrade to 4.16.0
> > commit, introduced the problem. I need to sign off soon. I appreciate
> > any help you can provide in looking into this issue!
> >
> > Thanks,
> > Michael
> >
> > [0] https://github.com/apache/pulsar/commit/25af808970fc02281a0b9345c5de205654d7b789
> > [1] https://github.com/apache/pulsar/commit/a7a605f13565edc380b8ae01808832c426f16d1f
> >
> > On Thu, Apr 13, 2023 at 4:51 PM Michael Marshall <mm...@apache.org> wrote:
> > >
> > > Here's an update on my progress. I reproduced Dave's observations with
> > > the following steps:
> > >
> > > 1. Download and unpack 3.0.0 RC 1.
> > >
> > > 2. Add these two settings to the conf/standalone.conf:
> > > exposingBrokerEntryMetadataToClientEnabled=true
> > > brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor,org.apache.pulsar.common.intercept.AppendBrokerTimestampMetadataInterceptor
> > >
> > > 3. Run bin/pulsar standalone
> > >
> > > 4. Run curl http://localhost:8080/admin/v2/brokers/health
> > >
> > > 5. View the following log:
> > >
> > > 2023-04-13T16:48:39,256-0500
> > > [broker-topic-workers-OrderedExecutor-7-0] ERROR
> > > org.apache.pulsar.common.protocol.Commands -
> > > [PersistentSubscription{topic=persistent://pulsar/standalone/localhost:8080/healthcheck,
> > > name=healthCheck-012d2e05-371f-4ddc-a666-daf9fa243d87}] [-1] Failed to
> > > parse message metadata
> > > java.lang.IndexOutOfBoundsException: readerIndex(100) + length(2)
> > > exceeds writerIndex(100): UnpooledDuplicatedByteBuf(ridx: 100, widx:
> > > 100, cap: 100, unwrapped: CompositeByteBuf(ridx: 100, widx: 100, cap:
> > > 100, components=2))
> > >     at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442)
> > > ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> > >     at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749)
> > > ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> > >     at org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692)
> > > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
> > >     at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452)
> > > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
> > >     at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445)
> > > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
> > >     at org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899)
> > > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
> > >     at org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918)
> > > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
> > >     at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142)
> > > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
> > >     at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100)
> > > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
> > >     at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210)
> > > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
> > >     at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151)
> > > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
> > >     at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
> > > ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> > >     at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113)
> > > ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> > >     at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > > ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final]
> > >     at java.lang.Thread.run(Thread.java:833) ~[?:?]
> > >
> > > Now that we have a consistent way to reproduce it, I'll work on
> > > understanding the root cause. I believe we run with these two settings
> > > with other versions, so this feels like a regression. I cannot say for
> > > sure yet.
> > >
> > > Thanks,
> > > Michael
> > >
> > > On Thu, Apr 13, 2023 at 1:47 AM Enrico Olivelli <eo...@gmail.com> wrote:
> > > >
> > > > Dave,
> > > > IIUC (from offline discussion) those tests started to fail only recently,
> > > > as we run those tests against the master branch of apache/pulsar repo.
> > > >
> > > > Does the beginning of failures match the upgrade to BK 4.16 ?
> > > > The problem may be related to the handling of ByteBufs, IIRC there are a
> > > > few changes in the latest
> > > > BK release.
> > > >
> > > > Enrico
> > > >
> > > > Il giorno gio 13 apr 2023 alle ore 05:02 Hang Chen <ch...@apache.org> ha
> > > > scritto:
> > > >
> > > > > Hi Dave,
> > > > >     Thanks for verifying this release candidate. I have setup the Pulsar
> > > > > cluster with 3.0 on my laptop and test the produce and consume, it works
> > > > > fine. We also deploy the it on k8s, the produce, consume and health check
> > > > > also works fine. Would you please share your broker configurations?
> > > > >
> > > > > Thanks,
> > > > > Hang
> > > > >
> > > > > Dave Fisher <wa...@comcast.net> 于2023年4月13日周四 06:49写道:
> > > > >
> > > > >> Hi -
> > > > >>
> > > > >> We do End to end testing of Pulsar versions on GCP and with Pulsar 3.0
> > > > >> today we have the following problem.
> > > > >>
> > > > >> This image shows the broker deployments not coming up. This is a non-TLS
> > > > >> setup.
> > > > >>
> > > > >> [image: Screen Shot 2023-04-12 at 1.09.59 PM.png]
> > > > >>
> > > > >> The Broker Pods have errors like this one:
> > > > >>
> > > > >> 2023-04-12T20:10:35,995+0000 [broker-topic-workers-OrderedExecutor-3-0] ERROR org.apache.pulsar.common.protocol.Commands - [PersistentSubscription{topic=persistent://pulsar/pulsar/10.236.0.4:8080/healthcheck, name=healthCheck-04a8e22e-3e0b-4aa3-9141-d8970c1b0712}] [-1] Failed to parse message metadata
> > > > >> java.lang.IndexOutOfBoundsException: readerIndex(96) + length(2) exceeds writerIndex(96): UnpooledDuplicatedByteBuf(ridx: 96, widx: 96, cap: 96, unwrapped: CompositeByteBuf(ridx: 96, widx: 96, cap: 96, components=2))
> > > > >>      at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> > > > >>      at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> > > > >>      at org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > > >>      at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > > >>      at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > > >>      at org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > > >>      at org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > > >>      at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > > >>      at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > > >>      at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > > >>      at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > > >>      at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> > > > >>      at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> > > > >>      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final]
> > > > >>      at java.lang.Thread.run(Thread.java:833) ~[?:?]
> > > > >>
> > > > >> We looked into this but aren’t sure what would cause this to suddenly be
> > > > >> an issue. The candidate would be differences in the new version of BK …
> > > > >>
> > > > >> Best,
> > > > >> Dave
> > > > >>
> > > > >

Re: [BUGS] Pulsar 3.0 GKE Cluster Won't Start

Posted by Michael Marshall <mm...@apache.org>.
I ended up digging a little longer. I have some partial results to share.

It seems to me the problem was likely introduced by this Bookkeeper PR
https://github.com/apache/bookkeeper/pull/3783.

These are likely the problematic lines:
https://github.com/apache/bookkeeper/blob/234b817cdb4e054887ffd5e42eaed25dc02daf63/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/checksum/DigestManager.java#L161-L167

My current theory is that we're using a CompositeByteBuf when the
brokerEntryMetadataInterceptors is in use. Because of that, the call
in OpAddEntry [0] does not produce a deep copy of the readerIndex and
the writerIndex. When pulsar passes the "duplicateBuffer" to the
bookkeeper client, the internal buffer's readerIndex and writerIndex
are both mutable state. Therefore, when we call the `writeBytes`
method in the DigestManager, the buffer's readerIndex is moved
forward. Interestingly, when the buffer is big enough, we rely on
`ByteBufList` to coalesce the buffers and it appears the readerIndex
is maintained in that case.

That likely explains why we're seeing the readerIndex equal to the
buffer's capacity.

It'd be good to get a definitive test to confirm this theory. (Sorry,
I would have tested myself, but I need to sign off now.)

Thanks,
Michael

[0] https://github.com/apache/pulsar/blob/bafecb2a9c0e73942de6a38df72dd5888d5afd66/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java#L126

On Thu, Apr 13, 2023 at 5:31 PM Michael Marshall <mm...@apache.org> wrote:
>
> By my testing using the above steps to reproduce the issue, I do not
> see the bug on commit [0] but I do see it on the subsequent commit
> [1].
>
> That indicates [1], which is the bookkeeper client upgrade to 4.16.0
> commit, introduced the problem. I need to sign off soon. I appreciate
> any help you can provide in looking into this issue!
>
> Thanks,
> Michael
>
> [0] https://github.com/apache/pulsar/commit/25af808970fc02281a0b9345c5de205654d7b789
> [1] https://github.com/apache/pulsar/commit/a7a605f13565edc380b8ae01808832c426f16d1f
>
> On Thu, Apr 13, 2023 at 4:51 PM Michael Marshall <mm...@apache.org> wrote:
> >
> > Here's an update on my progress. I reproduced Dave's observations with
> > the following steps:
> >
> > 1. Download and unpack 3.0.0 RC 1.
> >
> > 2. Add these two settings to the conf/standalone.conf:
> > exposingBrokerEntryMetadataToClientEnabled=true
> > brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor,org.apache.pulsar.common.intercept.AppendBrokerTimestampMetadataInterceptor
> >
> > 3. Run bin/pulsar standalone
> >
> > 4. Run curl http://localhost:8080/admin/v2/brokers/health
> >
> > 5. View the following log:
> >
> > 2023-04-13T16:48:39,256-0500
> > [broker-topic-workers-OrderedExecutor-7-0] ERROR
> > org.apache.pulsar.common.protocol.Commands -
> > [PersistentSubscription{topic=persistent://pulsar/standalone/localhost:8080/healthcheck,
> > name=healthCheck-012d2e05-371f-4ddc-a666-daf9fa243d87}] [-1] Failed to
> > parse message metadata
> > java.lang.IndexOutOfBoundsException: readerIndex(100) + length(2)
> > exceeds writerIndex(100): UnpooledDuplicatedByteBuf(ridx: 100, widx:
> > 100, cap: 100, unwrapped: CompositeByteBuf(ridx: 100, widx: 100, cap:
> > 100, components=2))
> >     at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442)
> > ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> >     at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749)
> > ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> >     at org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692)
> > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
> >     at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452)
> > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
> >     at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445)
> > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
> >     at org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899)
> > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
> >     at org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918)
> > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
> >     at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142)
> > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
> >     at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100)
> > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
> >     at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210)
> > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
> >     at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151)
> > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
> >     at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
> > ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> >     at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113)
> > ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> >     at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final]
> >     at java.lang.Thread.run(Thread.java:833) ~[?:?]
> >
> > Now that we have a consistent way to reproduce it, I'll work on
> > understanding the root cause. I believe we run with these two settings
> > with other versions, so this feels like a regression. I cannot say for
> > sure yet.
> >
> > Thanks,
> > Michael
> >
> > On Thu, Apr 13, 2023 at 1:47 AM Enrico Olivelli <eo...@gmail.com> wrote:
> > >
> > > Dave,
> > > IIUC (from offline discussion) those tests started to fail only recently,
> > > as we run those tests against the master branch of apache/pulsar repo.
> > >
> > > Does the beginning of failures match the upgrade to BK 4.16 ?
> > > The problem may be related to the handling of ByteBufs, IIRC there are a
> > > few changes in the latest
> > > BK release.
> > >
> > > Enrico
> > >
> > > Il giorno gio 13 apr 2023 alle ore 05:02 Hang Chen <ch...@apache.org> ha
> > > scritto:
> > >
> > > > Hi Dave,
> > > >     Thanks for verifying this release candidate. I have setup the Pulsar
> > > > cluster with 3.0 on my laptop and test the produce and consume, it works
> > > > fine. We also deploy the it on k8s, the produce, consume and health check
> > > > also works fine. Would you please share your broker configurations?
> > > >
> > > > Thanks,
> > > > Hang
> > > >
> > > > Dave Fisher <wa...@comcast.net> 于2023年4月13日周四 06:49写道:
> > > >
> > > >> Hi -
> > > >>
> > > >> We do End to end testing of Pulsar versions on GCP and with Pulsar 3.0
> > > >> today we have the following problem.
> > > >>
> > > >> This image shows the broker deployments not coming up. This is a non-TLS
> > > >> setup.
> > > >>
> > > >> [image: Screen Shot 2023-04-12 at 1.09.59 PM.png]
> > > >>
> > > >> The Broker Pods have errors like this one:
> > > >>
> > > >> 2023-04-12T20:10:35,995+0000 [broker-topic-workers-OrderedExecutor-3-0] ERROR org.apache.pulsar.common.protocol.Commands - [PersistentSubscription{topic=persistent://pulsar/pulsar/10.236.0.4:8080/healthcheck, name=healthCheck-04a8e22e-3e0b-4aa3-9141-d8970c1b0712}] [-1] Failed to parse message metadata
> > > >> java.lang.IndexOutOfBoundsException: readerIndex(96) + length(2) exceeds writerIndex(96): UnpooledDuplicatedByteBuf(ridx: 96, widx: 96, cap: 96, unwrapped: CompositeByteBuf(ridx: 96, widx: 96, cap: 96, components=2))
> > > >>      at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> > > >>      at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> > > >>      at org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > >>      at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > >>      at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > >>      at org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > >>      at org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > >>      at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > >>      at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > >>      at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > >>      at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > > >>      at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> > > >>      at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> > > >>      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final]
> > > >>      at java.lang.Thread.run(Thread.java:833) ~[?:?]
> > > >>
> > > >> We looked into this but aren’t sure what would cause this to suddenly be
> > > >> an issue. The candidate would be differences in the new version of BK …
> > > >>
> > > >> Best,
> > > >> Dave
> > > >>
> > > >

Re: [BUGS] Pulsar 3.0 GKE Cluster Won't Start

Posted by Michael Marshall <mm...@apache.org>.
By my testing using the above steps to reproduce the issue, I do not
see the bug on commit [0] but I do see it on the subsequent commit
[1].

That indicates [1], which is the bookkeeper client upgrade to 4.16.0
commit, introduced the problem. I need to sign off soon. I appreciate
any help you can provide in looking into this issue!

Thanks,
Michael

[0] https://github.com/apache/pulsar/commit/25af808970fc02281a0b9345c5de205654d7b789
[1] https://github.com/apache/pulsar/commit/a7a605f13565edc380b8ae01808832c426f16d1f

On Thu, Apr 13, 2023 at 4:51 PM Michael Marshall <mm...@apache.org> wrote:
>
> Here's an update on my progress. I reproduced Dave's observations with
> the following steps:
>
> 1. Download and unpack 3.0.0 RC 1.
>
> 2. Add these two settings to the conf/standalone.conf:
> exposingBrokerEntryMetadataToClientEnabled=true
> brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor,org.apache.pulsar.common.intercept.AppendBrokerTimestampMetadataInterceptor
>
> 3. Run bin/pulsar standalone
>
> 4. Run curl http://localhost:8080/admin/v2/brokers/health
>
> 5. View the following log:
>
> 2023-04-13T16:48:39,256-0500
> [broker-topic-workers-OrderedExecutor-7-0] ERROR
> org.apache.pulsar.common.protocol.Commands -
> [PersistentSubscription{topic=persistent://pulsar/standalone/localhost:8080/healthcheck,
> name=healthCheck-012d2e05-371f-4ddc-a666-daf9fa243d87}] [-1] Failed to
> parse message metadata
> java.lang.IndexOutOfBoundsException: readerIndex(100) + length(2)
> exceeds writerIndex(100): UnpooledDuplicatedByteBuf(ridx: 100, widx:
> 100, cap: 100, unwrapped: CompositeByteBuf(ridx: 100, widx: 100, cap:
> 100, components=2))
>     at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442)
> ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
>     at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749)
> ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
>     at org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692)
> ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
>     at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452)
> ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
>     at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445)
> ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
>     at org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899)
> ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
>     at org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918)
> ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
>     at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142)
> ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
>     at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100)
> ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
>     at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210)
> ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
>     at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151)
> ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
>     at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
> ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
>     at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113)
> ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
>     at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final]
>     at java.lang.Thread.run(Thread.java:833) ~[?:?]
>
> Now that we have a consistent way to reproduce it, I'll work on
> understanding the root cause. I believe we run with these two settings
> with other versions, so this feels like a regression. I cannot say for
> sure yet.
>
> Thanks,
> Michael
>
> On Thu, Apr 13, 2023 at 1:47 AM Enrico Olivelli <eo...@gmail.com> wrote:
> >
> > Dave,
> > IIUC (from offline discussion) those tests started to fail only recently,
> > as we run those tests against the master branch of apache/pulsar repo.
> >
> > Does the beginning of failures match the upgrade to BK 4.16 ?
> > The problem may be related to the handling of ByteBufs, IIRC there are a
> > few changes in the latest
> > BK release.
> >
> > Enrico
> >
> > Il giorno gio 13 apr 2023 alle ore 05:02 Hang Chen <ch...@apache.org> ha
> > scritto:
> >
> > > Hi Dave,
> > >     Thanks for verifying this release candidate. I have setup the Pulsar
> > > cluster with 3.0 on my laptop and test the produce and consume, it works
> > > fine. We also deploy the it on k8s, the produce, consume and health check
> > > also works fine. Would you please share your broker configurations?
> > >
> > > Thanks,
> > > Hang
> > >
> > > Dave Fisher <wa...@comcast.net> 于2023年4月13日周四 06:49写道:
> > >
> > >> Hi -
> > >>
> > >> We do End to end testing of Pulsar versions on GCP and with Pulsar 3.0
> > >> today we have the following problem.
> > >>
> > >> This image shows the broker deployments not coming up. This is a non-TLS
> > >> setup.
> > >>
> > >> [image: Screen Shot 2023-04-12 at 1.09.59 PM.png]
> > >>
> > >> The Broker Pods have errors like this one:
> > >>
> > >> 2023-04-12T20:10:35,995+0000 [broker-topic-workers-OrderedExecutor-3-0] ERROR org.apache.pulsar.common.protocol.Commands - [PersistentSubscription{topic=persistent://pulsar/pulsar/10.236.0.4:8080/healthcheck, name=healthCheck-04a8e22e-3e0b-4aa3-9141-d8970c1b0712}] [-1] Failed to parse message metadata
> > >> java.lang.IndexOutOfBoundsException: readerIndex(96) + length(2) exceeds writerIndex(96): UnpooledDuplicatedByteBuf(ridx: 96, widx: 96, cap: 96, unwrapped: CompositeByteBuf(ridx: 96, widx: 96, cap: 96, components=2))
> > >>      at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> > >>      at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> > >>      at org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > >>      at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > >>      at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > >>      at org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > >>      at org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > >>      at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > >>      at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > >>      at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > >>      at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> > >>      at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> > >>      at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> > >>      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final]
> > >>      at java.lang.Thread.run(Thread.java:833) ~[?:?]
> > >>
> > >> We looked into this but aren’t sure what would cause this to suddenly be
> > >> an issue. The candidate would be differences in the new version of BK …
> > >>
> > >> Best,
> > >> Dave
> > >>
> > >

Re: [BUGS] Pulsar 3.0 GKE Cluster Won't Start

Posted by Michael Marshall <mm...@apache.org>.
Here's an update on my progress. I reproduced Dave's observations with
the following steps:

1. Download and unpack 3.0.0 RC 1.

2. Add these two settings to the conf/standalone.conf:
exposingBrokerEntryMetadataToClientEnabled=true
brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor,org.apache.pulsar.common.intercept.AppendBrokerTimestampMetadataInterceptor

3. Run bin/pulsar standalone

4. Run curl http://localhost:8080/admin/v2/brokers/health

5. View the following log:

2023-04-13T16:48:39,256-0500
[broker-topic-workers-OrderedExecutor-7-0] ERROR
org.apache.pulsar.common.protocol.Commands -
[PersistentSubscription{topic=persistent://pulsar/standalone/localhost:8080/healthcheck,
name=healthCheck-012d2e05-371f-4ddc-a666-daf9fa243d87}] [-1] Failed to
parse message metadata
java.lang.IndexOutOfBoundsException: readerIndex(100) + length(2)
exceeds writerIndex(100): UnpooledDuplicatedByteBuf(ridx: 100, widx:
100, cap: 100, unwrapped: CompositeByteBuf(ridx: 100, widx: 100, cap:
100, components=2))
    at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442)
~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
    at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749)
~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
    at org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692)
~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
    at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452)
~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
    at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445)
~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
    at org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899)
~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
    at org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918)
~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0]
    at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142)
~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
    at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100)
~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
    at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210)
~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
    at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151)
~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0]
    at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
    at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113)
~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final]
    at java.lang.Thread.run(Thread.java:833) ~[?:?]

Now that we have a consistent way to reproduce it, I'll work on
understanding the root cause. I believe we run with these two settings
with other versions, so this feels like a regression. I cannot say for
sure yet.

Thanks,
Michael

On Thu, Apr 13, 2023 at 1:47 AM Enrico Olivelli <eo...@gmail.com> wrote:
>
> Dave,
> IIUC (from offline discussion) those tests started to fail only recently,
> as we run those tests against the master branch of apache/pulsar repo.
>
> Does the beginning of failures match the upgrade to BK 4.16 ?
> The problem may be related to the handling of ByteBufs, IIRC there are a
> few changes in the latest
> BK release.
>
> Enrico
>
> Il giorno gio 13 apr 2023 alle ore 05:02 Hang Chen <ch...@apache.org> ha
> scritto:
>
> > Hi Dave,
> >     Thanks for verifying this release candidate. I have setup the Pulsar
> > cluster with 3.0 on my laptop and test the produce and consume, it works
> > fine. We also deploy the it on k8s, the produce, consume and health check
> > also works fine. Would you please share your broker configurations?
> >
> > Thanks,
> > Hang
> >
> > Dave Fisher <wa...@comcast.net> 于2023年4月13日周四 06:49写道:
> >
> >> Hi -
> >>
> >> We do End to end testing of Pulsar versions on GCP and with Pulsar 3.0
> >> today we have the following problem.
> >>
> >> This image shows the broker deployments not coming up. This is a non-TLS
> >> setup.
> >>
> >> [image: Screen Shot 2023-04-12 at 1.09.59 PM.png]
> >>
> >> The Broker Pods have errors like this one:
> >>
> >> 2023-04-12T20:10:35,995+0000 [broker-topic-workers-OrderedExecutor-3-0] ERROR org.apache.pulsar.common.protocol.Commands - [PersistentSubscription{topic=persistent://pulsar/pulsar/10.236.0.4:8080/healthcheck, name=healthCheck-04a8e22e-3e0b-4aa3-9141-d8970c1b0712}] [-1] Failed to parse message metadata
> >> java.lang.IndexOutOfBoundsException: readerIndex(96) + length(2) exceeds writerIndex(96): UnpooledDuplicatedByteBuf(ridx: 96, widx: 96, cap: 96, unwrapped: CompositeByteBuf(ridx: 96, widx: 96, cap: 96, components=2))
> >>      at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> >>      at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> >>      at org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> >>      at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> >>      at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> >>      at org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> >>      at org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> >>      at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> >>      at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> >>      at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> >>      at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> >>      at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> >>      at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> >>      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final]
> >>      at java.lang.Thread.run(Thread.java:833) ~[?:?]
> >>
> >> We looked into this but aren’t sure what would cause this to suddenly be
> >> an issue. The candidate would be differences in the new version of BK …
> >>
> >> Best,
> >> Dave
> >>
> >

Re: [BUGS] Pulsar 3.0 GKE Cluster Won't Start

Posted by Enrico Olivelli <eo...@gmail.com>.
Dave,
IIUC (from offline discussion) those tests started to fail only recently,
as we run those tests against the master branch of apache/pulsar repo.

Does the beginning of failures match the upgrade to BK 4.16 ?
The problem may be related to the handling of ByteBufs, IIRC there are a
few changes in the latest
BK release.

Enrico

Il giorno gio 13 apr 2023 alle ore 05:02 Hang Chen <ch...@apache.org> ha
scritto:

> Hi Dave,
>     Thanks for verifying this release candidate. I have setup the Pulsar
> cluster with 3.0 on my laptop and test the produce and consume, it works
> fine. We also deploy the it on k8s, the produce, consume and health check
> also works fine. Would you please share your broker configurations?
>
> Thanks,
> Hang
>
> Dave Fisher <wa...@comcast.net> 于2023年4月13日周四 06:49写道:
>
>> Hi -
>>
>> We do End to end testing of Pulsar versions on GCP and with Pulsar 3.0
>> today we have the following problem.
>>
>> This image shows the broker deployments not coming up. This is a non-TLS
>> setup.
>>
>> [image: Screen Shot 2023-04-12 at 1.09.59 PM.png]
>>
>> The Broker Pods have errors like this one:
>>
>> 2023-04-12T20:10:35,995+0000 [broker-topic-workers-OrderedExecutor-3-0] ERROR org.apache.pulsar.common.protocol.Commands - [PersistentSubscription{topic=persistent://pulsar/pulsar/10.236.0.4:8080/healthcheck, name=healthCheck-04a8e22e-3e0b-4aa3-9141-d8970c1b0712}] [-1] Failed to parse message metadata
>> java.lang.IndexOutOfBoundsException: readerIndex(96) + length(2) exceeds writerIndex(96): UnpooledDuplicatedByteBuf(ridx: 96, widx: 96, cap: 96, unwrapped: CompositeByteBuf(ridx: 96, widx: 96, cap: 96, components=2))
>> 	at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
>> 	at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
>> 	at org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>> 	at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>> 	at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>> 	at org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>> 	at org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>> 	at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>> 	at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>> 	at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>> 	at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>> 	at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
>> 	at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
>> 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final]
>> 	at java.lang.Thread.run(Thread.java:833) ~[?:?]
>>
>> We looked into this but aren’t sure what would cause this to suddenly be
>> an issue. The candidate would be differences in the new version of BK …
>>
>> Best,
>> Dave
>>
>

Re: [BUGS] Pulsar 3.0 GKE Cluster Won't Start

Posted by Hang Chen <ch...@apache.org>.
Hi Dave,
    Thanks for verifying this release candidate. I have setup the Pulsar
cluster with 3.0 on my laptop and test the produce and consume, it works
fine. We also deploy the it on k8s, the produce, consume and health check
also works fine. Would you please share your broker configurations?

Thanks,
Hang

Dave Fisher <wa...@comcast.net> 于2023年4月13日周四 06:49写道:

> Hi -
>
> We do End to end testing of Pulsar versions on GCP and with Pulsar 3.0
> today we have the following problem.
>
> This image shows the broker deployments not coming up. This is a non-TLS
> setup.
>
> [image: Screen Shot 2023-04-12 at 1.09.59 PM.png]
>
> The Broker Pods have errors like this one:
>
> 2023-04-12T20:10:35,995+0000 [broker-topic-workers-OrderedExecutor-3-0] ERROR org.apache.pulsar.common.protocol.Commands - [PersistentSubscription{topic=persistent://pulsar/pulsar/10.236.0.4:8080/healthcheck, name=healthCheck-04a8e22e-3e0b-4aa3-9141-d8970c1b0712}] [-1] Failed to parse message metadata
> java.lang.IndexOutOfBoundsException: readerIndex(96) + length(2) exceeds writerIndex(96): UnpooledDuplicatedByteBuf(ridx: 96, widx: 96, cap: 96, unwrapped: CompositeByteBuf(ridx: 96, widx: 96, cap: 96, components=2))
> 	at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> 	at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749) ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final]
> 	at org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918) ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151) ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> 	at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> 	at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
> 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final]
> 	at java.lang.Thread.run(Thread.java:833) ~[?:?]
>
> We looked into this but aren’t sure what would cause this to suddenly be
> an issue. The candidate would be differences in the new version of BK …
>
> Best,
> Dave
>