You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Rajan Dhabalia <rd...@apache.org> on 2023/06/16 19:47:33 UTC

Re: [DISCUSS] PIP-267: Support multi-topic messageId deserialization to ack messages

Yes, the topic name will not be transferred and it's not part of the wire
protocol. Message uses MessageID protobuf data-structure to serialize and
deserialize MessageId and it doesn't change any behavior nor will transfer
any additional fields to the broker. and I would not like to introduce any
additional data-structure as that will create data copy, field
inconsistencies, and more garbage due to more object allocation and that's
something we would like to avoid.

Thanks,
Rajan

On Mon, May 15, 2023 at 6:18 AM PengHui Li <pe...@apache.org> wrote:

> I think the topic name will not be transmitted to the broker.
> The client side used the class generated by the protobuf message.
> Or, we can create another class to avoid coupling issues, but it
> will introduce more changes and copy data from one structure
> to another. For the long-term, I think it should be a good way if
> we don't have blockers with this solution. Because I don't think
> there is a higher priority in the long run than keeping the protocol clear.
>
> If the above options are not feasible. At least, we should clarify
> it in the proposal and add comments in the proto file to avoid
> other clients transmitting the topic name to the broker.
>
> Thanks,
> Penghui
>
> On Fri, May 12, 2023 at 5:59 PM Asaf Mesika <as...@gmail.com> wrote:
>
> > I don't get it - you say msgId is a data structure contained within
> > MessageId implementation, right? I presume msgId is the data structure
> the
> > client transmit to the server, so that means you are transmitting topic
> to
> > the server?
> >
> >
> > On Fri, May 12, 2023 at 7:45 AM Rajan Dhabalia <rd...@apache.org>
> > wrote:
> >
> > > Thank you for sharing your knowledge about the PIP which should be
> > created
> > > before PR and I think everyone in the community knows about it. but you
> > can
> > > check the PR for context which was blocked for sometime and we decided
> to
> > > create PIP with proto changes.
> > >
> > > This PIP/PR tries to fix the issue where partitioned topic fails while
> > > acking deserialized messageId. topic name will be part of MsgIdData
> which
> > > is the data-structure used by messageID to store msgID context along
> with
> > > partition, batching, and other metadata. topic name will be attached
> only
> > > when the user tries to serialize and deserialize the messageId which
> will
> > > be purely client side implementation and in other cases it will not be
> > > transmitted to server. Also, partitioned topic's abstract concept for
> > user
> > > and messageID must be also remain abstract for users and users must not
> > > know about different implementation of messageId and our goal is to
> > > maintain that abstraction without telling user about MessageIdImpl or
> > > TopicMessageIdImpl.
> > >
> > > Thanks,
> > > Rajan
> > >
> > >
> > > On Thu, May 11, 2023 at 7:29 AM Asaf Mesika <as...@gmail.com>
> > wrote:
> > >
> > > > Hi Rajan,
> > > >
> > > > A few comments on the PIP as I couldn't understand it fully as some
> > > pieces
> > > > of information is missing.
> > > >
> > > > First, I would like to remind about the rules, that exists in the
> > > beginning
> > > > of the PIP template:
> > > >
> > > > <!--
> > > > RULES
> > > > * Never place a link to an external site like Google Doc. The
> proposal
> > > > should be in this issue entirely.
> > > > * Use a spelling and grammar checker tools if available for you
> (there
> > > are
> > > > plenty of free ones)
> > > >
> > > > PROPOSAL HEALTH CHECK
> > > > I can read the design document and understand the problem statement
> and
> > > > what you plan to change *without* resorting to a couple of hours of
> > code
> > > > reading just to start having a high level understanding of the
> change.
> > > > -->
> > > >
> > > >
> > > > In this specific case
> > > > 1. I would include explanation and detail the data structures fields
> of
> > > > objects you mentioned, such as: MessageIdImpl and MessageIdData.
> > > > 2. I would not put a PR as the design section, so I need to read code
> > to
> > > > understand what the exact solution details.
> > > >
> > > > You wrote:
> > > >
> > > > > Pulsar api provides MessageId
> > > > > <
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/MessageId.java
> > > >
> > > > interface
> > > > > which is generally used by producer and consumer applications to
> > manage
> > > > > topic offset.
> > > >
> > > >
> > > > I think it's used to allow consumers to acknowledge (can be per
> > message)
> > > so
> > > > offset if wrong terminology here.
> > > > For producers, not sure exactly its usage. Maybe if they need to
> refer
> > to
> > > > this message later when reading by Reader interface.
> > > > I would correct this section.
> > > >
> > > > However, right now Pulsar doesn't support correct deserialization of
> > > > > multi-topic or partitioned-topic because of that 1acknowledge` API
> > call
> > > > > fails for those topics with below error
> > > >
> > > >
> > > > You're saying that the acknowledgement API method signature receives
> > > > MessageId, but do not receive TopicMessageId?
> > > >
> > > > I have a few questions on that:
> > > >
> > > > 1. The acknowledgement API is part of Pulsar binary protocol. Is your
> > > plan
> > > > to alter that protocol so it will also include the topic field as
> part
> > of
> > > > the message ID?
> > > >
> > > > 2. I think your PIP needs to explain the following items which are
> > > missing
> > > > as context:
> > > > - There are two implementation for MessageId interface, one for topic
> > and
> > > > one for partitioned topic.
> > > > - The problem is that the serialization/desrialization method is used
> > > > mainly for translating the ID into the binary protocol, which only
> > > requires
> > > > the ID (ledger ID, entry ID).
> > > > - The reason for that is that once you created a consumer, it has a
> > topic
> > > > attached to it. Transferring the topic for the ack is redundant.
> > > >
> > > > All of this needs to be in the background.
> > > >
> > > > I have several ideas on solving that, which IMO should mainly be in
> the
> > > > client level, but I must get answers to the questions above before I
> > can
> > > > continue.
> > > >
> > > > Last note
> > > > You have basically placed a link to a pull request as the design
> > solution
> > > > (high-level/detailed design).
> > > > The whole idea of the design is that you describe the solution
> without
> > > > resorting to code.
> > > > IMO you should amend the design, state the goal shortly, and have a
> > high
> > > > level design section which contains 1-2 short paragraphs describing
> > > exactly
> > > > your solution.
> > > >
> > > > Thanks,
> > > >
> > > > Asaf
> > > >
> > > >
> > > >
> > > > On Tue, May 9, 2023 at 3:24 PM Yunze Xu <xy...@apache.org> wrote:
> > > >
> > > > > I'm talking about whether to add a new separate API. I'm concerned
> > > > > about whether existing applications would be affected, no matter if
> > > > > the existing implementation has the limitation. If yes, we should
> > > > > document it in the PIP so that users can know that.
> > > > >
> > > > > > it's a new optional field which would not break the compatibility
> > > > >
> > > > > And yes, I just confirmed it with simple demos in my local env. So
> > I'm
> > > > > +1 to this proposal.
> > > > >
> > > > > Thanks,
> > > > > Yunze
> > > > >
> > > > > On Tue, May 9, 2023 at 3:05 PM Rajan Dhabalia <
> rdhabalia@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > Weill there are multiple things: it's a new optional field which
> > > would
> > > > > not
> > > > > > break the compatibility , also current messaegId serialization
> and
> > > > > > deserialization anyway only impact multi-topic consumer which is
> > > > already
> > > > > > broken or has the limitation and, adding a new separate API for
> > > > > partitioned
> > > > > > topic is not only not acceptable but creates too much confusion
> for
> > > > users
> > > > > > to use separate ack APIs for non-partition and partition topics
> and
> > > > that
> > > > > > also breaks partitioned topic abstraction which we would like to
> > > avoid.
> > > > > >
> > > > > > Thanks,
> > > > > > Rajan
> > > > > >
> > > > > > On Mon, May 8, 2023 at 11:27 PM Yunze Xu <xy...@apache.org> wrote:
> > > > > >
> > > > > > > It seems that `TopicMessageIdImpl#toByteArray` now could
> > serialize
> > > > the
> > > > > > > optional topic field to the bytes. I didn't test it now but I
> > have
> > > a
> > > > > > > concern about whether it would bring a breaking change.
> > > > > > >
> > > > > > > Assuming there are two applications (let's say A and B) based
> on
> > an
> > > > > > > older Pulsar client, A writes serialized bytes into a file, B
> > reads
> > > > > > > bytes from the file and parses it to a MessageId. If A upgraded
> > its
> > > > > > > Pulsar client to the latest while B did not, what would happen?
> > > Could
> > > > > > > B still get the correct MessageId or the bytes would not be
> able
> > to
> > > > > > > parsed?
> > > > > > >
> > > > > > > P.S. it's better to add the API changes and potential breaking
> > > > changes
> > > > > > > in the proposal.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yunze
> > > > > > >
> > > > > > > On Tue, May 9, 2023 at 1:59 PM Rajan Dhabalia <
> > > rdhabalia@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Pulsar api provides MessageId interface which is generally
> used
> > > by
> > > > > > > producer
> > > > > > > > and consumer applications to manage topic offset. Sometimes,
> > > these
> > > > > > > > applications would like to serialize and deserialize
> > messageIds,
> > > > > > > > specifically consumer app which would like to persist
> messageId
> > > and
> > > > > ack
> > > > > > > > with those messageIds by deserializing them. However, right
> now
> > > > > Pulsar
> > > > > > > > doesn't support correct deserialization of multi-topic or
> > > > > > > partitioned-topic
> > > > > > > > because of that 1acknowledge` API call fails for those topics
> > > with
> > > > > below
> > > > > > > > error:
> > > > > > > > "Only TopicMessageId is allowed to acknowledge for a
> > multi-topics
> > > > > > > consumer"
> > > > > > > >
> > > > > > > > MessageIdImpl stores id metadata into MessageIdData which
> > doesn't
> > > > > contain
> > > > > > > > context about topic name to find out which topic belongs to
> > this
> > > > > > > MessageID.
> > > > > > > > Therefore, we need to add topic-name into MessageIdData and
> > allow
> > > > > > > > multi-topic/partitioned topics to deserialize messages
> > correctly
> > > > so,
> > > > > > > > consumer app can perform as expected.
> > > > > > > >
> > > > > > > > Please visit PIP for any suggestions:
> > > > > > > > https://github.com/apache/pulsar/issues/20221
> > > > > > > >
> > > > > > > > This PIP is created with PR:
> > > > > https://github.com/apache/pulsar/pull/19944
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Rajan
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-267: Support multi-topic messageId deserialization to ack messages

Posted by Asaf Mesika <as...@gmail.com>.
I'll continue this on Slack #dev and write the summary here.

Just to clarify any misunderstanding: My intention is to make Pulsar PIP
readable by anyone, which means: Adding the required background information
and explaining your idea in a way people can understand.

In light of this goal, I've introduced a PIP template to make it clear what
is missing and also switched to PRs to make discussion easier than in the
mailing list, thus making participation easier for everyone, which means
more feedback ==> clearer proposals.



On Tue, Jun 20, 2023 at 11:17 PM Rajan Dhabalia <rd...@apache.org>
wrote:

> Hi Asaf,
>
> I really don't know what's your concern but it seems you don't have much
> understanding about Pulsar client/server protocol or you really would like
> to block the PIP. I tried to answer your concerns but let me try again to
> add more context about the implementation if that something can help you:
> this PIP makes change only in protobuf of message-id which is in
> implementation named as MessageIdData and it uses to serialize and
> deserialize messageId for the users. and this PIP is adding a new field to
> support messageId deserialization for partition-topic or multi-consumer
> topics.
> Now, does it impact wire protocol and will the client start sending this
> newly added field topic-name to broker? then answer is no because while
> sending ack command to broker client creates messageID where it doesn't set
> this field [1] and this new field only used during message
> serialization/deserialization when client app calls
> toByteArray()/fromByteArray() methods. so, this should not add any n/w
> overhead for the payload when client sends ack command to broker.
> [1]
>
> https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/java/org/apache/pulsar/common/protocol/Commands.java#L1018
>
> I am not sure if that helps you to answer the question or I should try to
> talk about Pulsar client-server protocol implementation here but we can
> help you in slack#dev channel if you have more implementation questions.
>
> Thanks,
> Rajan
>
>
>
> On Tue, Jun 20, 2023 at 11:31 AM Asaf Mesika <as...@gmail.com>
> wrote:
>
> > On Tue, Jun 20, 2023 at 9:39 AM Rajan Dhabalia <rd...@apache.org>
> > wrote:
> >
> > > > So you say in that sentence that you will add the topic name into
> > > MessageIdData. MessageIdData is defined in PulsarApi.proto and is
> > > transferred over the wire, so how can you add the topic to this class
> > > without changing the wire protocol?
> > > Yes, the client creates a separate MessageId while creating a
> serialized
> > > payload for acking where it doesn't set or send topicname and it won't
> > > change the payload.
> > >
> > >
> > But it contradicts what you wrote in the design doc. I'm sorry, but I
> don't
> > get it.
> > Can you please help me understand this by elaborating so anyone,
> including
> > me, can fully understand it?
> > Preferably all your answers should be injected into the document, of
> > course.
> >
> > Thanks!
> >
> > Asaf
> >
> >
> >
> > > Thanks,
> > > Rajan
> > >
> > > On Mon, Jun 19, 2023 at 5:45 AM Asaf Mesika <as...@gmail.com>
> > wrote:
> > >
> > > > First, let me add some data that should be added to the Background
> > > section
> > > > of the PIP since I had to reverse engineer the code to understand
> that,
> > > > which is the opposite of the goal of a design document.
> > > >
> > > > ----
> > > > Pulsar Broker has a binary protocol, which allows the client to
> consume
> > > > messages, acknowledge them, and much more. The protocol comprises
> > > Commands
> > > > containing the data needed to apply that Command on the broker side.
> > Many
> > > > commands allow a consumer (client) to acknowledge messages, among
> them:
> > > > CommandSendReceipt, CommandSend, CommandAck, and more. All those
> > commands
> > > > use the message type MessageIdData to specify the details of the
> > message
> > > to
> > > > acknowledge.
> > > >
> > > > Here's what this data structure looks like:
> > > > message MessageIdData {
> > > > required uint64 ledgerId = 1;
> > > > required uint64 entryId = 2;
> > > > optional int32 partition = 3 [default = -1];
> > > > optional int32 batch_index = 4 [default = -1];
> > > > repeated int64 ack_set = 5;
> > > > optional int32 batch_size = 6;
> > > >
> > > > // For the chunk message id, we need to specify the first chunk
> message
> > > id.
> > > > optional MessageIdData first_chunk_message_id = 7;
> > > > }
> > > >
> > > > The key fields are the ledgerID at which the message is contained and
> > > > entryId, which indicates the offset inside the ledger (message
> number).
> > > >
> > > > The client uses a class named MessageIdData which is the
> auto-generated
> > > > code representing the message MessageIdData.
> > > > ---------
> > > >
> > > > Now, in the design, you wrote:
> > > >
> > > > > Thefore, we need to add topic-name into MessageIdData and allow
> > > > > multi-topic/partitioned topic to deserialize message correctly so,
> > API
> > > > like
> > > > > acknowledge can perform as expected.
> > > >
> > > >
> > > > So you say in that sentence that you will add the topic name into
> > > > MessageIdData.
> > > > MessageIdData is defined in PulsarApi.proto and is transferred over
> the
> > > > wire, so how can you add the topic to this class without changing the
> > > wire
> > > > protocol?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jun 16, 2023 at 10:47 PM Rajan Dhabalia <
> rdhabalia@apache.org>
> > > > wrote:
> > > >
> > > > > Yes, the topic name will not be transferred and it's not part of
> the
> > > wire
> > > > > protocol. Message uses MessageID protobuf data-structure to
> serialize
> > > and
> > > > > deserialize MessageId and it doesn't change any behavior nor will
> > > > transfer
> > > > > any additional fields to the broker. and I would not like to
> > introduce
> > > > any
> > > > > additional data-structure as that will create data copy, field
> > > > > inconsistencies, and more garbage due to more object allocation and
> > > > that's
> > > > > something we would like to avoid.
> > > > >
> > > > > Thanks,
> > > > > Rajan
> > > > >
> > > > > On Mon, May 15, 2023 at 6:18 AM PengHui Li <pe...@apache.org>
> > wrote:
> > > > >
> > > > > > I think the topic name will not be transmitted to the broker.
> > > > > > The client side used the class generated by the protobuf message.
> > > > > > Or, we can create another class to avoid coupling issues, but it
> > > > > > will introduce more changes and copy data from one structure
> > > > > > to another. For the long-term, I think it should be a good way if
> > > > > > we don't have blockers with this solution. Because I don't think
> > > > > > there is a higher priority in the long run than keeping the
> > protocol
> > > > > clear.
> > > > > >
> > > > > > If the above options are not feasible. At least, we should
> clarify
> > > > > > it in the proposal and add comments in the proto file to avoid
> > > > > > other clients transmitting the topic name to the broker.
> > > > > >
> > > > > > Thanks,
> > > > > > Penghui
> > > > > >
> > > > > > On Fri, May 12, 2023 at 5:59 PM Asaf Mesika <
> asaf.mesika@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > I don't get it - you say msgId is a data structure contained
> > within
> > > > > > > MessageId implementation, right? I presume msgId is the data
> > > > structure
> > > > > > the
> > > > > > > client transmit to the server, so that means you are
> transmitting
> > > > topic
> > > > > > to
> > > > > > > the server?
> > > > > > >
> > > > > > >
> > > > > > > On Fri, May 12, 2023 at 7:45 AM Rajan Dhabalia <
> > > rdhabalia@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thank you for sharing your knowledge about the PIP which
> should
> > > be
> > > > > > > created
> > > > > > > > before PR and I think everyone in the community knows about
> it.
> > > but
> > > > > you
> > > > > > > can
> > > > > > > > check the PR for context which was blocked for sometime and
> we
> > > > > decided
> > > > > > to
> > > > > > > > create PIP with proto changes.
> > > > > > > >
> > > > > > > > This PIP/PR tries to fix the issue where partitioned topic
> > fails
> > > > > while
> > > > > > > > acking deserialized messageId. topic name will be part of
> > > MsgIdData
> > > > > > which
> > > > > > > > is the data-structure used by messageID to store msgID
> context
> > > > along
> > > > > > with
> > > > > > > > partition, batching, and other metadata. topic name will be
> > > > attached
> > > > > > only
> > > > > > > > when the user tries to serialize and deserialize the
> messageId
> > > > which
> > > > > > will
> > > > > > > > be purely client side implementation and in other cases it
> will
> > > not
> > > > > be
> > > > > > > > transmitted to server. Also, partitioned topic's abstract
> > concept
> > > > for
> > > > > > > user
> > > > > > > > and messageID must be also remain abstract for users and
> users
> > > must
> > > > > not
> > > > > > > > know about different implementation of messageId and our goal
> > is
> > > to
> > > > > > > > maintain that abstraction without telling user about
> > > MessageIdImpl
> > > > or
> > > > > > > > TopicMessageIdImpl.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Rajan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, May 11, 2023 at 7:29 AM Asaf Mesika <
> > > asaf.mesika@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Rajan,
> > > > > > > > >
> > > > > > > > > A few comments on the PIP as I couldn't understand it fully
> > as
> > > > some
> > > > > > > > pieces
> > > > > > > > > of information is missing.
> > > > > > > > >
> > > > > > > > > First, I would like to remind about the rules, that exists
> in
> > > the
> > > > > > > > beginning
> > > > > > > > > of the PIP template:
> > > > > > > > >
> > > > > > > > > <!--
> > > > > > > > > RULES
> > > > > > > > > * Never place a link to an external site like Google Doc.
> The
> > > > > > proposal
> > > > > > > > > should be in this issue entirely.
> > > > > > > > > * Use a spelling and grammar checker tools if available for
> > you
> > > > > > (there
> > > > > > > > are
> > > > > > > > > plenty of free ones)
> > > > > > > > >
> > > > > > > > > PROPOSAL HEALTH CHECK
> > > > > > > > > I can read the design document and understand the problem
> > > > statement
> > > > > > and
> > > > > > > > > what you plan to change *without* resorting to a couple of
> > > hours
> > > > of
> > > > > > > code
> > > > > > > > > reading just to start having a high level understanding of
> > the
> > > > > > change.
> > > > > > > > > -->
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > In this specific case
> > > > > > > > > 1. I would include explanation and detail the data
> structures
> > > > > fields
> > > > > > of
> > > > > > > > > objects you mentioned, such as: MessageIdImpl and
> > > MessageIdData.
> > > > > > > > > 2. I would not put a PR as the design section, so I need to
> > > read
> > > > > code
> > > > > > > to
> > > > > > > > > understand what the exact solution details.
> > > > > > > > >
> > > > > > > > > You wrote:
> > > > > > > > >
> > > > > > > > > > Pulsar api provides MessageId
> > > > > > > > > > <
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/MessageId.java
> > > > > > > > >
> > > > > > > > > interface
> > > > > > > > > > which is generally used by producer and consumer
> > applications
> > > > to
> > > > > > > manage
> > > > > > > > > > topic offset.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think it's used to allow consumers to acknowledge (can be
> > per
> > > > > > > message)
> > > > > > > > so
> > > > > > > > > offset if wrong terminology here.
> > > > > > > > > For producers, not sure exactly its usage. Maybe if they
> need
> > > to
> > > > > > refer
> > > > > > > to
> > > > > > > > > this message later when reading by Reader interface.
> > > > > > > > > I would correct this section.
> > > > > > > > >
> > > > > > > > > However, right now Pulsar doesn't support correct
> > > deserialization
> > > > > of
> > > > > > > > > > multi-topic or partitioned-topic because of that
> > > 1acknowledge`
> > > > > API
> > > > > > > call
> > > > > > > > > > fails for those topics with below error
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > You're saying that the acknowledgement API method signature
> > > > > receives
> > > > > > > > > MessageId, but do not receive TopicMessageId?
> > > > > > > > >
> > > > > > > > > I have a few questions on that:
> > > > > > > > >
> > > > > > > > > 1. The acknowledgement API is part of Pulsar binary
> protocol.
> > > Is
> > > > > your
> > > > > > > > plan
> > > > > > > > > to alter that protocol so it will also include the topic
> > field
> > > as
> > > > > > part
> > > > > > > of
> > > > > > > > > the message ID?
> > > > > > > > >
> > > > > > > > > 2. I think your PIP needs to explain the following items
> > which
> > > > are
> > > > > > > > missing
> > > > > > > > > as context:
> > > > > > > > > - There are two implementation for MessageId interface, one
> > for
> > > > > topic
> > > > > > > and
> > > > > > > > > one for partitioned topic.
> > > > > > > > > - The problem is that the serialization/desrialization
> method
> > > is
> > > > > used
> > > > > > > > > mainly for translating the ID into the binary protocol,
> which
> > > > only
> > > > > > > > requires
> > > > > > > > > the ID (ledger ID, entry ID).
> > > > > > > > > - The reason for that is that once you created a consumer,
> it
> > > > has a
> > > > > > > topic
> > > > > > > > > attached to it. Transferring the topic for the ack is
> > > redundant.
> > > > > > > > >
> > > > > > > > > All of this needs to be in the background.
> > > > > > > > >
> > > > > > > > > I have several ideas on solving that, which IMO should
> mainly
> > > be
> > > > in
> > > > > > the
> > > > > > > > > client level, but I must get answers to the questions above
> > > > before
> > > > > I
> > > > > > > can
> > > > > > > > > continue.
> > > > > > > > >
> > > > > > > > > Last note
> > > > > > > > > You have basically placed a link to a pull request as the
> > > design
> > > > > > > solution
> > > > > > > > > (high-level/detailed design).
> > > > > > > > > The whole idea of the design is that you describe the
> > solution
> > > > > > without
> > > > > > > > > resorting to code.
> > > > > > > > > IMO you should amend the design, state the goal shortly,
> and
> > > > have a
> > > > > > > high
> > > > > > > > > level design section which contains 1-2 short paragraphs
> > > > describing
> > > > > > > > exactly
> > > > > > > > > your solution.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Asaf
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, May 9, 2023 at 3:24 PM Yunze Xu <xy...@apache.org>
> > > wrote:
> > > > > > > > >
> > > > > > > > > > I'm talking about whether to add a new separate API. I'm
> > > > > concerned
> > > > > > > > > > about whether existing applications would be affected, no
> > > > matter
> > > > > if
> > > > > > > > > > the existing implementation has the limitation. If yes,
> we
> > > > should
> > > > > > > > > > document it in the PIP so that users can know that.
> > > > > > > > > >
> > > > > > > > > > > it's a new optional field which would not break the
> > > > > compatibility
> > > > > > > > > >
> > > > > > > > > > And yes, I just confirmed it with simple demos in my
> local
> > > env.
> > > > > So
> > > > > > > I'm
> > > > > > > > > > +1 to this proposal.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Yunze
> > > > > > > > > >
> > > > > > > > > > On Tue, May 9, 2023 at 3:05 PM Rajan Dhabalia <
> > > > > > rdhabalia@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Weill there are multiple things: it's a new optional
> > field
> > > > > which
> > > > > > > > would
> > > > > > > > > > not
> > > > > > > > > > > break the compatibility , also current messaegId
> > > > serialization
> > > > > > and
> > > > > > > > > > > deserialization anyway only impact multi-topic consumer
> > > which
> > > > > is
> > > > > > > > > already
> > > > > > > > > > > broken or has the limitation and, adding a new separate
> > API
> > > > for
> > > > > > > > > > partitioned
> > > > > > > > > > > topic is not only not acceptable but creates too much
> > > > confusion
> > > > > > for
> > > > > > > > > users
> > > > > > > > > > > to use separate ack APIs for non-partition and
> partition
> > > > topics
> > > > > > and
> > > > > > > > > that
> > > > > > > > > > > also breaks partitioned topic abstraction which we
> would
> > > like
> > > > > to
> > > > > > > > avoid.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Rajan
> > > > > > > > > > >
> > > > > > > > > > > On Mon, May 8, 2023 at 11:27 PM Yunze Xu <
> xyz@apache.org
> > >
> > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > It seems that `TopicMessageIdImpl#toByteArray` now
> > could
> > > > > > > serialize
> > > > > > > > > the
> > > > > > > > > > > > optional topic field to the bytes. I didn't test it
> now
> > > > but I
> > > > > > > have
> > > > > > > > a
> > > > > > > > > > > > concern about whether it would bring a breaking
> change.
> > > > > > > > > > > >
> > > > > > > > > > > > Assuming there are two applications (let's say A and
> B)
> > > > based
> > > > > > on
> > > > > > > an
> > > > > > > > > > > > older Pulsar client, A writes serialized bytes into a
> > > > file, B
> > > > > > > reads
> > > > > > > > > > > > bytes from the file and parses it to a MessageId. If
> A
> > > > > upgraded
> > > > > > > its
> > > > > > > > > > > > Pulsar client to the latest while B did not, what
> would
> > > > > happen?
> > > > > > > > Could
> > > > > > > > > > > > B still get the correct MessageId or the bytes would
> > not
> > > be
> > > > > > able
> > > > > > > to
> > > > > > > > > > > > parsed?
> > > > > > > > > > > >
> > > > > > > > > > > > P.S. it's better to add the API changes and potential
> > > > > breaking
> > > > > > > > > changes
> > > > > > > > > > > > in the proposal.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Yunze
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, May 9, 2023 at 1:59 PM Rajan Dhabalia <
> > > > > > > > rdhabalia@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Pulsar api provides MessageId interface which is
> > > > generally
> > > > > > used
> > > > > > > > by
> > > > > > > > > > > > producer
> > > > > > > > > > > > > and consumer applications to manage topic offset.
> > > > > Sometimes,
> > > > > > > > these
> > > > > > > > > > > > > applications would like to serialize and
> deserialize
> > > > > > > messageIds,
> > > > > > > > > > > > > specifically consumer app which would like to
> persist
> > > > > > messageId
> > > > > > > > and
> > > > > > > > > > ack
> > > > > > > > > > > > > with those messageIds by deserializing them.
> However,
> > > > right
> > > > > > now
> > > > > > > > > > Pulsar
> > > > > > > > > > > > > doesn't support correct deserialization of
> > multi-topic
> > > or
> > > > > > > > > > > > partitioned-topic
> > > > > > > > > > > > > because of that 1acknowledge` API call fails for
> > those
> > > > > topics
> > > > > > > > with
> > > > > > > > > > below
> > > > > > > > > > > > > error:
> > > > > > > > > > > > > "Only TopicMessageId is allowed to acknowledge for
> a
> > > > > > > multi-topics
> > > > > > > > > > > > consumer"
> > > > > > > > > > > > >
> > > > > > > > > > > > > MessageIdImpl stores id metadata into MessageIdData
> > > which
> > > > > > > doesn't
> > > > > > > > > > contain
> > > > > > > > > > > > > context about topic name to find out which topic
> > > belongs
> > > > to
> > > > > > > this
> > > > > > > > > > > > MessageID.
> > > > > > > > > > > > > Therefore, we need to add topic-name into
> > MessageIdData
> > > > and
> > > > > > > allow
> > > > > > > > > > > > > multi-topic/partitioned topics to deserialize
> > messages
> > > > > > > correctly
> > > > > > > > > so,
> > > > > > > > > > > > > consumer app can perform as expected.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please visit PIP for any suggestions:
> > > > > > > > > > > > > https://github.com/apache/pulsar/issues/20221
> > > > > > > > > > > > >
> > > > > > > > > > > > > This PIP is created with PR:
> > > > > > > > > > https://github.com/apache/pulsar/pull/19944
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Rajan
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-267: Support multi-topic messageId deserialization to ack messages

Posted by Rajan Dhabalia <rd...@apache.org>.
Hi Asaf,

I really don't know what's your concern but it seems you don't have much
understanding about Pulsar client/server protocol or you really would like
to block the PIP. I tried to answer your concerns but let me try again to
add more context about the implementation if that something can help you:
this PIP makes change only in protobuf of message-id which is in
implementation named as MessageIdData and it uses to serialize and
deserialize messageId for the users. and this PIP is adding a new field to
support messageId deserialization for partition-topic or multi-consumer
topics.
Now, does it impact wire protocol and will the client start sending this
newly added field topic-name to broker? then answer is no because while
sending ack command to broker client creates messageID where it doesn't set
this field [1] and this new field only used during message
serialization/deserialization when client app calls
toByteArray()/fromByteArray() methods. so, this should not add any n/w
overhead for the payload when client sends ack command to broker.
[1]
https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/java/org/apache/pulsar/common/protocol/Commands.java#L1018

I am not sure if that helps you to answer the question or I should try to
talk about Pulsar client-server protocol implementation here but we can
help you in slack#dev channel if you have more implementation questions.

Thanks,
Rajan



On Tue, Jun 20, 2023 at 11:31 AM Asaf Mesika <as...@gmail.com> wrote:

> On Tue, Jun 20, 2023 at 9:39 AM Rajan Dhabalia <rd...@apache.org>
> wrote:
>
> > > So you say in that sentence that you will add the topic name into
> > MessageIdData. MessageIdData is defined in PulsarApi.proto and is
> > transferred over the wire, so how can you add the topic to this class
> > without changing the wire protocol?
> > Yes, the client creates a separate MessageId while creating a serialized
> > payload for acking where it doesn't set or send topicname and it won't
> > change the payload.
> >
> >
> But it contradicts what you wrote in the design doc. I'm sorry, but I don't
> get it.
> Can you please help me understand this by elaborating so anyone, including
> me, can fully understand it?
> Preferably all your answers should be injected into the document, of
> course.
>
> Thanks!
>
> Asaf
>
>
>
> > Thanks,
> > Rajan
> >
> > On Mon, Jun 19, 2023 at 5:45 AM Asaf Mesika <as...@gmail.com>
> wrote:
> >
> > > First, let me add some data that should be added to the Background
> > section
> > > of the PIP since I had to reverse engineer the code to understand that,
> > > which is the opposite of the goal of a design document.
> > >
> > > ----
> > > Pulsar Broker has a binary protocol, which allows the client to consume
> > > messages, acknowledge them, and much more. The protocol comprises
> > Commands
> > > containing the data needed to apply that Command on the broker side.
> Many
> > > commands allow a consumer (client) to acknowledge messages, among them:
> > > CommandSendReceipt, CommandSend, CommandAck, and more. All those
> commands
> > > use the message type MessageIdData to specify the details of the
> message
> > to
> > > acknowledge.
> > >
> > > Here's what this data structure looks like:
> > > message MessageIdData {
> > > required uint64 ledgerId = 1;
> > > required uint64 entryId = 2;
> > > optional int32 partition = 3 [default = -1];
> > > optional int32 batch_index = 4 [default = -1];
> > > repeated int64 ack_set = 5;
> > > optional int32 batch_size = 6;
> > >
> > > // For the chunk message id, we need to specify the first chunk message
> > id.
> > > optional MessageIdData first_chunk_message_id = 7;
> > > }
> > >
> > > The key fields are the ledgerID at which the message is contained and
> > > entryId, which indicates the offset inside the ledger (message number).
> > >
> > > The client uses a class named MessageIdData which is the auto-generated
> > > code representing the message MessageIdData.
> > > ---------
> > >
> > > Now, in the design, you wrote:
> > >
> > > > Thefore, we need to add topic-name into MessageIdData and allow
> > > > multi-topic/partitioned topic to deserialize message correctly so,
> API
> > > like
> > > > acknowledge can perform as expected.
> > >
> > >
> > > So you say in that sentence that you will add the topic name into
> > > MessageIdData.
> > > MessageIdData is defined in PulsarApi.proto and is transferred over the
> > > wire, so how can you add the topic to this class without changing the
> > wire
> > > protocol?
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jun 16, 2023 at 10:47 PM Rajan Dhabalia <rd...@apache.org>
> > > wrote:
> > >
> > > > Yes, the topic name will not be transferred and it's not part of the
> > wire
> > > > protocol. Message uses MessageID protobuf data-structure to serialize
> > and
> > > > deserialize MessageId and it doesn't change any behavior nor will
> > > transfer
> > > > any additional fields to the broker. and I would not like to
> introduce
> > > any
> > > > additional data-structure as that will create data copy, field
> > > > inconsistencies, and more garbage due to more object allocation and
> > > that's
> > > > something we would like to avoid.
> > > >
> > > > Thanks,
> > > > Rajan
> > > >
> > > > On Mon, May 15, 2023 at 6:18 AM PengHui Li <pe...@apache.org>
> wrote:
> > > >
> > > > > I think the topic name will not be transmitted to the broker.
> > > > > The client side used the class generated by the protobuf message.
> > > > > Or, we can create another class to avoid coupling issues, but it
> > > > > will introduce more changes and copy data from one structure
> > > > > to another. For the long-term, I think it should be a good way if
> > > > > we don't have blockers with this solution. Because I don't think
> > > > > there is a higher priority in the long run than keeping the
> protocol
> > > > clear.
> > > > >
> > > > > If the above options are not feasible. At least, we should clarify
> > > > > it in the proposal and add comments in the proto file to avoid
> > > > > other clients transmitting the topic name to the broker.
> > > > >
> > > > > Thanks,
> > > > > Penghui
> > > > >
> > > > > On Fri, May 12, 2023 at 5:59 PM Asaf Mesika <asaf.mesika@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > I don't get it - you say msgId is a data structure contained
> within
> > > > > > MessageId implementation, right? I presume msgId is the data
> > > structure
> > > > > the
> > > > > > client transmit to the server, so that means you are transmitting
> > > topic
> > > > > to
> > > > > > the server?
> > > > > >
> > > > > >
> > > > > > On Fri, May 12, 2023 at 7:45 AM Rajan Dhabalia <
> > rdhabalia@apache.org
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thank you for sharing your knowledge about the PIP which should
> > be
> > > > > > created
> > > > > > > before PR and I think everyone in the community knows about it.
> > but
> > > > you
> > > > > > can
> > > > > > > check the PR for context which was blocked for sometime and we
> > > > decided
> > > > > to
> > > > > > > create PIP with proto changes.
> > > > > > >
> > > > > > > This PIP/PR tries to fix the issue where partitioned topic
> fails
> > > > while
> > > > > > > acking deserialized messageId. topic name will be part of
> > MsgIdData
> > > > > which
> > > > > > > is the data-structure used by messageID to store msgID context
> > > along
> > > > > with
> > > > > > > partition, batching, and other metadata. topic name will be
> > > attached
> > > > > only
> > > > > > > when the user tries to serialize and deserialize the messageId
> > > which
> > > > > will
> > > > > > > be purely client side implementation and in other cases it will
> > not
> > > > be
> > > > > > > transmitted to server. Also, partitioned topic's abstract
> concept
> > > for
> > > > > > user
> > > > > > > and messageID must be also remain abstract for users and users
> > must
> > > > not
> > > > > > > know about different implementation of messageId and our goal
> is
> > to
> > > > > > > maintain that abstraction without telling user about
> > MessageIdImpl
> > > or
> > > > > > > TopicMessageIdImpl.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Rajan
> > > > > > >
> > > > > > >
> > > > > > > On Thu, May 11, 2023 at 7:29 AM Asaf Mesika <
> > asaf.mesika@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Rajan,
> > > > > > > >
> > > > > > > > A few comments on the PIP as I couldn't understand it fully
> as
> > > some
> > > > > > > pieces
> > > > > > > > of information is missing.
> > > > > > > >
> > > > > > > > First, I would like to remind about the rules, that exists in
> > the
> > > > > > > beginning
> > > > > > > > of the PIP template:
> > > > > > > >
> > > > > > > > <!--
> > > > > > > > RULES
> > > > > > > > * Never place a link to an external site like Google Doc. The
> > > > > proposal
> > > > > > > > should be in this issue entirely.
> > > > > > > > * Use a spelling and grammar checker tools if available for
> you
> > > > > (there
> > > > > > > are
> > > > > > > > plenty of free ones)
> > > > > > > >
> > > > > > > > PROPOSAL HEALTH CHECK
> > > > > > > > I can read the design document and understand the problem
> > > statement
> > > > > and
> > > > > > > > what you plan to change *without* resorting to a couple of
> > hours
> > > of
> > > > > > code
> > > > > > > > reading just to start having a high level understanding of
> the
> > > > > change.
> > > > > > > > -->
> > > > > > > >
> > > > > > > >
> > > > > > > > In this specific case
> > > > > > > > 1. I would include explanation and detail the data structures
> > > > fields
> > > > > of
> > > > > > > > objects you mentioned, such as: MessageIdImpl and
> > MessageIdData.
> > > > > > > > 2. I would not put a PR as the design section, so I need to
> > read
> > > > code
> > > > > > to
> > > > > > > > understand what the exact solution details.
> > > > > > > >
> > > > > > > > You wrote:
> > > > > > > >
> > > > > > > > > Pulsar api provides MessageId
> > > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/MessageId.java
> > > > > > > >
> > > > > > > > interface
> > > > > > > > > which is generally used by producer and consumer
> applications
> > > to
> > > > > > manage
> > > > > > > > > topic offset.
> > > > > > > >
> > > > > > > >
> > > > > > > > I think it's used to allow consumers to acknowledge (can be
> per
> > > > > > message)
> > > > > > > so
> > > > > > > > offset if wrong terminology here.
> > > > > > > > For producers, not sure exactly its usage. Maybe if they need
> > to
> > > > > refer
> > > > > > to
> > > > > > > > this message later when reading by Reader interface.
> > > > > > > > I would correct this section.
> > > > > > > >
> > > > > > > > However, right now Pulsar doesn't support correct
> > deserialization
> > > > of
> > > > > > > > > multi-topic or partitioned-topic because of that
> > 1acknowledge`
> > > > API
> > > > > > call
> > > > > > > > > fails for those topics with below error
> > > > > > > >
> > > > > > > >
> > > > > > > > You're saying that the acknowledgement API method signature
> > > > receives
> > > > > > > > MessageId, but do not receive TopicMessageId?
> > > > > > > >
> > > > > > > > I have a few questions on that:
> > > > > > > >
> > > > > > > > 1. The acknowledgement API is part of Pulsar binary protocol.
> > Is
> > > > your
> > > > > > > plan
> > > > > > > > to alter that protocol so it will also include the topic
> field
> > as
> > > > > part
> > > > > > of
> > > > > > > > the message ID?
> > > > > > > >
> > > > > > > > 2. I think your PIP needs to explain the following items
> which
> > > are
> > > > > > > missing
> > > > > > > > as context:
> > > > > > > > - There are two implementation for MessageId interface, one
> for
> > > > topic
> > > > > > and
> > > > > > > > one for partitioned topic.
> > > > > > > > - The problem is that the serialization/desrialization method
> > is
> > > > used
> > > > > > > > mainly for translating the ID into the binary protocol, which
> > > only
> > > > > > > requires
> > > > > > > > the ID (ledger ID, entry ID).
> > > > > > > > - The reason for that is that once you created a consumer, it
> > > has a
> > > > > > topic
> > > > > > > > attached to it. Transferring the topic for the ack is
> > redundant.
> > > > > > > >
> > > > > > > > All of this needs to be in the background.
> > > > > > > >
> > > > > > > > I have several ideas on solving that, which IMO should mainly
> > be
> > > in
> > > > > the
> > > > > > > > client level, but I must get answers to the questions above
> > > before
> > > > I
> > > > > > can
> > > > > > > > continue.
> > > > > > > >
> > > > > > > > Last note
> > > > > > > > You have basically placed a link to a pull request as the
> > design
> > > > > > solution
> > > > > > > > (high-level/detailed design).
> > > > > > > > The whole idea of the design is that you describe the
> solution
> > > > > without
> > > > > > > > resorting to code.
> > > > > > > > IMO you should amend the design, state the goal shortly, and
> > > have a
> > > > > > high
> > > > > > > > level design section which contains 1-2 short paragraphs
> > > describing
> > > > > > > exactly
> > > > > > > > your solution.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Asaf
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, May 9, 2023 at 3:24 PM Yunze Xu <xy...@apache.org>
> > wrote:
> > > > > > > >
> > > > > > > > > I'm talking about whether to add a new separate API. I'm
> > > > concerned
> > > > > > > > > about whether existing applications would be affected, no
> > > matter
> > > > if
> > > > > > > > > the existing implementation has the limitation. If yes, we
> > > should
> > > > > > > > > document it in the PIP so that users can know that.
> > > > > > > > >
> > > > > > > > > > it's a new optional field which would not break the
> > > > compatibility
> > > > > > > > >
> > > > > > > > > And yes, I just confirmed it with simple demos in my local
> > env.
> > > > So
> > > > > > I'm
> > > > > > > > > +1 to this proposal.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Yunze
> > > > > > > > >
> > > > > > > > > On Tue, May 9, 2023 at 3:05 PM Rajan Dhabalia <
> > > > > rdhabalia@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Weill there are multiple things: it's a new optional
> field
> > > > which
> > > > > > > would
> > > > > > > > > not
> > > > > > > > > > break the compatibility , also current messaegId
> > > serialization
> > > > > and
> > > > > > > > > > deserialization anyway only impact multi-topic consumer
> > which
> > > > is
> > > > > > > > already
> > > > > > > > > > broken or has the limitation and, adding a new separate
> API
> > > for
> > > > > > > > > partitioned
> > > > > > > > > > topic is not only not acceptable but creates too much
> > > confusion
> > > > > for
> > > > > > > > users
> > > > > > > > > > to use separate ack APIs for non-partition and partition
> > > topics
> > > > > and
> > > > > > > > that
> > > > > > > > > > also breaks partitioned topic abstraction which we would
> > like
> > > > to
> > > > > > > avoid.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Rajan
> > > > > > > > > >
> > > > > > > > > > On Mon, May 8, 2023 at 11:27 PM Yunze Xu <xyz@apache.org
> >
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > It seems that `TopicMessageIdImpl#toByteArray` now
> could
> > > > > > serialize
> > > > > > > > the
> > > > > > > > > > > optional topic field to the bytes. I didn't test it now
> > > but I
> > > > > > have
> > > > > > > a
> > > > > > > > > > > concern about whether it would bring a breaking change.
> > > > > > > > > > >
> > > > > > > > > > > Assuming there are two applications (let's say A and B)
> > > based
> > > > > on
> > > > > > an
> > > > > > > > > > > older Pulsar client, A writes serialized bytes into a
> > > file, B
> > > > > > reads
> > > > > > > > > > > bytes from the file and parses it to a MessageId. If A
> > > > upgraded
> > > > > > its
> > > > > > > > > > > Pulsar client to the latest while B did not, what would
> > > > happen?
> > > > > > > Could
> > > > > > > > > > > B still get the correct MessageId or the bytes would
> not
> > be
> > > > > able
> > > > > > to
> > > > > > > > > > > parsed?
> > > > > > > > > > >
> > > > > > > > > > > P.S. it's better to add the API changes and potential
> > > > breaking
> > > > > > > > changes
> > > > > > > > > > > in the proposal.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Yunze
> > > > > > > > > > >
> > > > > > > > > > > On Tue, May 9, 2023 at 1:59 PM Rajan Dhabalia <
> > > > > > > rdhabalia@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > Pulsar api provides MessageId interface which is
> > > generally
> > > > > used
> > > > > > > by
> > > > > > > > > > > producer
> > > > > > > > > > > > and consumer applications to manage topic offset.
> > > > Sometimes,
> > > > > > > these
> > > > > > > > > > > > applications would like to serialize and deserialize
> > > > > > messageIds,
> > > > > > > > > > > > specifically consumer app which would like to persist
> > > > > messageId
> > > > > > > and
> > > > > > > > > ack
> > > > > > > > > > > > with those messageIds by deserializing them. However,
> > > right
> > > > > now
> > > > > > > > > Pulsar
> > > > > > > > > > > > doesn't support correct deserialization of
> multi-topic
> > or
> > > > > > > > > > > partitioned-topic
> > > > > > > > > > > > because of that 1acknowledge` API call fails for
> those
> > > > topics
> > > > > > > with
> > > > > > > > > below
> > > > > > > > > > > > error:
> > > > > > > > > > > > "Only TopicMessageId is allowed to acknowledge for a
> > > > > > multi-topics
> > > > > > > > > > > consumer"
> > > > > > > > > > > >
> > > > > > > > > > > > MessageIdImpl stores id metadata into MessageIdData
> > which
> > > > > > doesn't
> > > > > > > > > contain
> > > > > > > > > > > > context about topic name to find out which topic
> > belongs
> > > to
> > > > > > this
> > > > > > > > > > > MessageID.
> > > > > > > > > > > > Therefore, we need to add topic-name into
> MessageIdData
> > > and
> > > > > > allow
> > > > > > > > > > > > multi-topic/partitioned topics to deserialize
> messages
> > > > > > correctly
> > > > > > > > so,
> > > > > > > > > > > > consumer app can perform as expected.
> > > > > > > > > > > >
> > > > > > > > > > > > Please visit PIP for any suggestions:
> > > > > > > > > > > > https://github.com/apache/pulsar/issues/20221
> > > > > > > > > > > >
> > > > > > > > > > > > This PIP is created with PR:
> > > > > > > > > https://github.com/apache/pulsar/pull/19944
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Rajan
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-267: Support multi-topic messageId deserialization to ack messages

Posted by Asaf Mesika <as...@gmail.com>.
On Tue, Jun 20, 2023 at 9:39 AM Rajan Dhabalia <rd...@apache.org> wrote:

> > So you say in that sentence that you will add the topic name into
> MessageIdData. MessageIdData is defined in PulsarApi.proto and is
> transferred over the wire, so how can you add the topic to this class
> without changing the wire protocol?
> Yes, the client creates a separate MessageId while creating a serialized
> payload for acking where it doesn't set or send topicname and it won't
> change the payload.
>
>
But it contradicts what you wrote in the design doc. I'm sorry, but I don't
get it.
Can you please help me understand this by elaborating so anyone, including
me, can fully understand it?
Preferably all your answers should be injected into the document, of course.

Thanks!

Asaf



> Thanks,
> Rajan
>
> On Mon, Jun 19, 2023 at 5:45 AM Asaf Mesika <as...@gmail.com> wrote:
>
> > First, let me add some data that should be added to the Background
> section
> > of the PIP since I had to reverse engineer the code to understand that,
> > which is the opposite of the goal of a design document.
> >
> > ----
> > Pulsar Broker has a binary protocol, which allows the client to consume
> > messages, acknowledge them, and much more. The protocol comprises
> Commands
> > containing the data needed to apply that Command on the broker side. Many
> > commands allow a consumer (client) to acknowledge messages, among them:
> > CommandSendReceipt, CommandSend, CommandAck, and more. All those commands
> > use the message type MessageIdData to specify the details of the message
> to
> > acknowledge.
> >
> > Here's what this data structure looks like:
> > message MessageIdData {
> > required uint64 ledgerId = 1;
> > required uint64 entryId = 2;
> > optional int32 partition = 3 [default = -1];
> > optional int32 batch_index = 4 [default = -1];
> > repeated int64 ack_set = 5;
> > optional int32 batch_size = 6;
> >
> > // For the chunk message id, we need to specify the first chunk message
> id.
> > optional MessageIdData first_chunk_message_id = 7;
> > }
> >
> > The key fields are the ledgerID at which the message is contained and
> > entryId, which indicates the offset inside the ledger (message number).
> >
> > The client uses a class named MessageIdData which is the auto-generated
> > code representing the message MessageIdData.
> > ---------
> >
> > Now, in the design, you wrote:
> >
> > > Thefore, we need to add topic-name into MessageIdData and allow
> > > multi-topic/partitioned topic to deserialize message correctly so, API
> > like
> > > acknowledge can perform as expected.
> >
> >
> > So you say in that sentence that you will add the topic name into
> > MessageIdData.
> > MessageIdData is defined in PulsarApi.proto and is transferred over the
> > wire, so how can you add the topic to this class without changing the
> wire
> > protocol?
> >
> >
> >
> >
> >
> > On Fri, Jun 16, 2023 at 10:47 PM Rajan Dhabalia <rd...@apache.org>
> > wrote:
> >
> > > Yes, the topic name will not be transferred and it's not part of the
> wire
> > > protocol. Message uses MessageID protobuf data-structure to serialize
> and
> > > deserialize MessageId and it doesn't change any behavior nor will
> > transfer
> > > any additional fields to the broker. and I would not like to introduce
> > any
> > > additional data-structure as that will create data copy, field
> > > inconsistencies, and more garbage due to more object allocation and
> > that's
> > > something we would like to avoid.
> > >
> > > Thanks,
> > > Rajan
> > >
> > > On Mon, May 15, 2023 at 6:18 AM PengHui Li <pe...@apache.org> wrote:
> > >
> > > > I think the topic name will not be transmitted to the broker.
> > > > The client side used the class generated by the protobuf message.
> > > > Or, we can create another class to avoid coupling issues, but it
> > > > will introduce more changes and copy data from one structure
> > > > to another. For the long-term, I think it should be a good way if
> > > > we don't have blockers with this solution. Because I don't think
> > > > there is a higher priority in the long run than keeping the protocol
> > > clear.
> > > >
> > > > If the above options are not feasible. At least, we should clarify
> > > > it in the proposal and add comments in the proto file to avoid
> > > > other clients transmitting the topic name to the broker.
> > > >
> > > > Thanks,
> > > > Penghui
> > > >
> > > > On Fri, May 12, 2023 at 5:59 PM Asaf Mesika <as...@gmail.com>
> > > wrote:
> > > >
> > > > > I don't get it - you say msgId is a data structure contained within
> > > > > MessageId implementation, right? I presume msgId is the data
> > structure
> > > > the
> > > > > client transmit to the server, so that means you are transmitting
> > topic
> > > > to
> > > > > the server?
> > > > >
> > > > >
> > > > > On Fri, May 12, 2023 at 7:45 AM Rajan Dhabalia <
> rdhabalia@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > Thank you for sharing your knowledge about the PIP which should
> be
> > > > > created
> > > > > > before PR and I think everyone in the community knows about it.
> but
> > > you
> > > > > can
> > > > > > check the PR for context which was blocked for sometime and we
> > > decided
> > > > to
> > > > > > create PIP with proto changes.
> > > > > >
> > > > > > This PIP/PR tries to fix the issue where partitioned topic fails
> > > while
> > > > > > acking deserialized messageId. topic name will be part of
> MsgIdData
> > > > which
> > > > > > is the data-structure used by messageID to store msgID context
> > along
> > > > with
> > > > > > partition, batching, and other metadata. topic name will be
> > attached
> > > > only
> > > > > > when the user tries to serialize and deserialize the messageId
> > which
> > > > will
> > > > > > be purely client side implementation and in other cases it will
> not
> > > be
> > > > > > transmitted to server. Also, partitioned topic's abstract concept
> > for
> > > > > user
> > > > > > and messageID must be also remain abstract for users and users
> must
> > > not
> > > > > > know about different implementation of messageId and our goal is
> to
> > > > > > maintain that abstraction without telling user about
> MessageIdImpl
> > or
> > > > > > TopicMessageIdImpl.
> > > > > >
> > > > > > Thanks,
> > > > > > Rajan
> > > > > >
> > > > > >
> > > > > > On Thu, May 11, 2023 at 7:29 AM Asaf Mesika <
> asaf.mesika@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Rajan,
> > > > > > >
> > > > > > > A few comments on the PIP as I couldn't understand it fully as
> > some
> > > > > > pieces
> > > > > > > of information is missing.
> > > > > > >
> > > > > > > First, I would like to remind about the rules, that exists in
> the
> > > > > > beginning
> > > > > > > of the PIP template:
> > > > > > >
> > > > > > > <!--
> > > > > > > RULES
> > > > > > > * Never place a link to an external site like Google Doc. The
> > > > proposal
> > > > > > > should be in this issue entirely.
> > > > > > > * Use a spelling and grammar checker tools if available for you
> > > > (there
> > > > > > are
> > > > > > > plenty of free ones)
> > > > > > >
> > > > > > > PROPOSAL HEALTH CHECK
> > > > > > > I can read the design document and understand the problem
> > statement
> > > > and
> > > > > > > what you plan to change *without* resorting to a couple of
> hours
> > of
> > > > > code
> > > > > > > reading just to start having a high level understanding of the
> > > > change.
> > > > > > > -->
> > > > > > >
> > > > > > >
> > > > > > > In this specific case
> > > > > > > 1. I would include explanation and detail the data structures
> > > fields
> > > > of
> > > > > > > objects you mentioned, such as: MessageIdImpl and
> MessageIdData.
> > > > > > > 2. I would not put a PR as the design section, so I need to
> read
> > > code
> > > > > to
> > > > > > > understand what the exact solution details.
> > > > > > >
> > > > > > > You wrote:
> > > > > > >
> > > > > > > > Pulsar api provides MessageId
> > > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/MessageId.java
> > > > > > >
> > > > > > > interface
> > > > > > > > which is generally used by producer and consumer applications
> > to
> > > > > manage
> > > > > > > > topic offset.
> > > > > > >
> > > > > > >
> > > > > > > I think it's used to allow consumers to acknowledge (can be per
> > > > > message)
> > > > > > so
> > > > > > > offset if wrong terminology here.
> > > > > > > For producers, not sure exactly its usage. Maybe if they need
> to
> > > > refer
> > > > > to
> > > > > > > this message later when reading by Reader interface.
> > > > > > > I would correct this section.
> > > > > > >
> > > > > > > However, right now Pulsar doesn't support correct
> deserialization
> > > of
> > > > > > > > multi-topic or partitioned-topic because of that
> 1acknowledge`
> > > API
> > > > > call
> > > > > > > > fails for those topics with below error
> > > > > > >
> > > > > > >
> > > > > > > You're saying that the acknowledgement API method signature
> > > receives
> > > > > > > MessageId, but do not receive TopicMessageId?
> > > > > > >
> > > > > > > I have a few questions on that:
> > > > > > >
> > > > > > > 1. The acknowledgement API is part of Pulsar binary protocol.
> Is
> > > your
> > > > > > plan
> > > > > > > to alter that protocol so it will also include the topic field
> as
> > > > part
> > > > > of
> > > > > > > the message ID?
> > > > > > >
> > > > > > > 2. I think your PIP needs to explain the following items which
> > are
> > > > > > missing
> > > > > > > as context:
> > > > > > > - There are two implementation for MessageId interface, one for
> > > topic
> > > > > and
> > > > > > > one for partitioned topic.
> > > > > > > - The problem is that the serialization/desrialization method
> is
> > > used
> > > > > > > mainly for translating the ID into the binary protocol, which
> > only
> > > > > > requires
> > > > > > > the ID (ledger ID, entry ID).
> > > > > > > - The reason for that is that once you created a consumer, it
> > has a
> > > > > topic
> > > > > > > attached to it. Transferring the topic for the ack is
> redundant.
> > > > > > >
> > > > > > > All of this needs to be in the background.
> > > > > > >
> > > > > > > I have several ideas on solving that, which IMO should mainly
> be
> > in
> > > > the
> > > > > > > client level, but I must get answers to the questions above
> > before
> > > I
> > > > > can
> > > > > > > continue.
> > > > > > >
> > > > > > > Last note
> > > > > > > You have basically placed a link to a pull request as the
> design
> > > > > solution
> > > > > > > (high-level/detailed design).
> > > > > > > The whole idea of the design is that you describe the solution
> > > > without
> > > > > > > resorting to code.
> > > > > > > IMO you should amend the design, state the goal shortly, and
> > have a
> > > > > high
> > > > > > > level design section which contains 1-2 short paragraphs
> > describing
> > > > > > exactly
> > > > > > > your solution.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Asaf
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, May 9, 2023 at 3:24 PM Yunze Xu <xy...@apache.org>
> wrote:
> > > > > > >
> > > > > > > > I'm talking about whether to add a new separate API. I'm
> > > concerned
> > > > > > > > about whether existing applications would be affected, no
> > matter
> > > if
> > > > > > > > the existing implementation has the limitation. If yes, we
> > should
> > > > > > > > document it in the PIP so that users can know that.
> > > > > > > >
> > > > > > > > > it's a new optional field which would not break the
> > > compatibility
> > > > > > > >
> > > > > > > > And yes, I just confirmed it with simple demos in my local
> env.
> > > So
> > > > > I'm
> > > > > > > > +1 to this proposal.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Yunze
> > > > > > > >
> > > > > > > > On Tue, May 9, 2023 at 3:05 PM Rajan Dhabalia <
> > > > rdhabalia@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Weill there are multiple things: it's a new optional field
> > > which
> > > > > > would
> > > > > > > > not
> > > > > > > > > break the compatibility , also current messaegId
> > serialization
> > > > and
> > > > > > > > > deserialization anyway only impact multi-topic consumer
> which
> > > is
> > > > > > > already
> > > > > > > > > broken or has the limitation and, adding a new separate API
> > for
> > > > > > > > partitioned
> > > > > > > > > topic is not only not acceptable but creates too much
> > confusion
> > > > for
> > > > > > > users
> > > > > > > > > to use separate ack APIs for non-partition and partition
> > topics
> > > > and
> > > > > > > that
> > > > > > > > > also breaks partitioned topic abstraction which we would
> like
> > > to
> > > > > > avoid.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Rajan
> > > > > > > > >
> > > > > > > > > On Mon, May 8, 2023 at 11:27 PM Yunze Xu <xy...@apache.org>
> > > wrote:
> > > > > > > > >
> > > > > > > > > > It seems that `TopicMessageIdImpl#toByteArray` now could
> > > > > serialize
> > > > > > > the
> > > > > > > > > > optional topic field to the bytes. I didn't test it now
> > but I
> > > > > have
> > > > > > a
> > > > > > > > > > concern about whether it would bring a breaking change.
> > > > > > > > > >
> > > > > > > > > > Assuming there are two applications (let's say A and B)
> > based
> > > > on
> > > > > an
> > > > > > > > > > older Pulsar client, A writes serialized bytes into a
> > file, B
> > > > > reads
> > > > > > > > > > bytes from the file and parses it to a MessageId. If A
> > > upgraded
> > > > > its
> > > > > > > > > > Pulsar client to the latest while B did not, what would
> > > happen?
> > > > > > Could
> > > > > > > > > > B still get the correct MessageId or the bytes would not
> be
> > > > able
> > > > > to
> > > > > > > > > > parsed?
> > > > > > > > > >
> > > > > > > > > > P.S. it's better to add the API changes and potential
> > > breaking
> > > > > > > changes
> > > > > > > > > > in the proposal.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Yunze
> > > > > > > > > >
> > > > > > > > > > On Tue, May 9, 2023 at 1:59 PM Rajan Dhabalia <
> > > > > > rdhabalia@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > Pulsar api provides MessageId interface which is
> > generally
> > > > used
> > > > > > by
> > > > > > > > > > producer
> > > > > > > > > > > and consumer applications to manage topic offset.
> > > Sometimes,
> > > > > > these
> > > > > > > > > > > applications would like to serialize and deserialize
> > > > > messageIds,
> > > > > > > > > > > specifically consumer app which would like to persist
> > > > messageId
> > > > > > and
> > > > > > > > ack
> > > > > > > > > > > with those messageIds by deserializing them. However,
> > right
> > > > now
> > > > > > > > Pulsar
> > > > > > > > > > > doesn't support correct deserialization of multi-topic
> or
> > > > > > > > > > partitioned-topic
> > > > > > > > > > > because of that 1acknowledge` API call fails for those
> > > topics
> > > > > > with
> > > > > > > > below
> > > > > > > > > > > error:
> > > > > > > > > > > "Only TopicMessageId is allowed to acknowledge for a
> > > > > multi-topics
> > > > > > > > > > consumer"
> > > > > > > > > > >
> > > > > > > > > > > MessageIdImpl stores id metadata into MessageIdData
> which
> > > > > doesn't
> > > > > > > > contain
> > > > > > > > > > > context about topic name to find out which topic
> belongs
> > to
> > > > > this
> > > > > > > > > > MessageID.
> > > > > > > > > > > Therefore, we need to add topic-name into MessageIdData
> > and
> > > > > allow
> > > > > > > > > > > multi-topic/partitioned topics to deserialize messages
> > > > > correctly
> > > > > > > so,
> > > > > > > > > > > consumer app can perform as expected.
> > > > > > > > > > >
> > > > > > > > > > > Please visit PIP for any suggestions:
> > > > > > > > > > > https://github.com/apache/pulsar/issues/20221
> > > > > > > > > > >
> > > > > > > > > > > This PIP is created with PR:
> > > > > > > > https://github.com/apache/pulsar/pull/19944
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Rajan
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-267: Support multi-topic messageId deserialization to ack messages

Posted by Rajan Dhabalia <rd...@apache.org>.
> So you say in that sentence that you will add the topic name into
MessageIdData. MessageIdData is defined in PulsarApi.proto and is
transferred over the wire, so how can you add the topic to this class
without changing the wire protocol?
Yes, the client creates a separate MessageId while creating a serialized
payload for acking where it doesn't set or send topicname and it won't
change the payload.

Thanks,
Rajan

On Mon, Jun 19, 2023 at 5:45 AM Asaf Mesika <as...@gmail.com> wrote:

> First, let me add some data that should be added to the Background section
> of the PIP since I had to reverse engineer the code to understand that,
> which is the opposite of the goal of a design document.
>
> ----
> Pulsar Broker has a binary protocol, which allows the client to consume
> messages, acknowledge them, and much more. The protocol comprises Commands
> containing the data needed to apply that Command on the broker side. Many
> commands allow a consumer (client) to acknowledge messages, among them:
> CommandSendReceipt, CommandSend, CommandAck, and more. All those commands
> use the message type MessageIdData to specify the details of the message to
> acknowledge.
>
> Here's what this data structure looks like:
> message MessageIdData {
> required uint64 ledgerId = 1;
> required uint64 entryId = 2;
> optional int32 partition = 3 [default = -1];
> optional int32 batch_index = 4 [default = -1];
> repeated int64 ack_set = 5;
> optional int32 batch_size = 6;
>
> // For the chunk message id, we need to specify the first chunk message id.
> optional MessageIdData first_chunk_message_id = 7;
> }
>
> The key fields are the ledgerID at which the message is contained and
> entryId, which indicates the offset inside the ledger (message number).
>
> The client uses a class named MessageIdData which is the auto-generated
> code representing the message MessageIdData.
> ---------
>
> Now, in the design, you wrote:
>
> > Thefore, we need to add topic-name into MessageIdData and allow
> > multi-topic/partitioned topic to deserialize message correctly so, API
> like
> > acknowledge can perform as expected.
>
>
> So you say in that sentence that you will add the topic name into
> MessageIdData.
> MessageIdData is defined in PulsarApi.proto and is transferred over the
> wire, so how can you add the topic to this class without changing the wire
> protocol?
>
>
>
>
>
> On Fri, Jun 16, 2023 at 10:47 PM Rajan Dhabalia <rd...@apache.org>
> wrote:
>
> > Yes, the topic name will not be transferred and it's not part of the wire
> > protocol. Message uses MessageID protobuf data-structure to serialize and
> > deserialize MessageId and it doesn't change any behavior nor will
> transfer
> > any additional fields to the broker. and I would not like to introduce
> any
> > additional data-structure as that will create data copy, field
> > inconsistencies, and more garbage due to more object allocation and
> that's
> > something we would like to avoid.
> >
> > Thanks,
> > Rajan
> >
> > On Mon, May 15, 2023 at 6:18 AM PengHui Li <pe...@apache.org> wrote:
> >
> > > I think the topic name will not be transmitted to the broker.
> > > The client side used the class generated by the protobuf message.
> > > Or, we can create another class to avoid coupling issues, but it
> > > will introduce more changes and copy data from one structure
> > > to another. For the long-term, I think it should be a good way if
> > > we don't have blockers with this solution. Because I don't think
> > > there is a higher priority in the long run than keeping the protocol
> > clear.
> > >
> > > If the above options are not feasible. At least, we should clarify
> > > it in the proposal and add comments in the proto file to avoid
> > > other clients transmitting the topic name to the broker.
> > >
> > > Thanks,
> > > Penghui
> > >
> > > On Fri, May 12, 2023 at 5:59 PM Asaf Mesika <as...@gmail.com>
> > wrote:
> > >
> > > > I don't get it - you say msgId is a data structure contained within
> > > > MessageId implementation, right? I presume msgId is the data
> structure
> > > the
> > > > client transmit to the server, so that means you are transmitting
> topic
> > > to
> > > > the server?
> > > >
> > > >
> > > > On Fri, May 12, 2023 at 7:45 AM Rajan Dhabalia <rdhabalia@apache.org
> >
> > > > wrote:
> > > >
> > > > > Thank you for sharing your knowledge about the PIP which should be
> > > > created
> > > > > before PR and I think everyone in the community knows about it. but
> > you
> > > > can
> > > > > check the PR for context which was blocked for sometime and we
> > decided
> > > to
> > > > > create PIP with proto changes.
> > > > >
> > > > > This PIP/PR tries to fix the issue where partitioned topic fails
> > while
> > > > > acking deserialized messageId. topic name will be part of MsgIdData
> > > which
> > > > > is the data-structure used by messageID to store msgID context
> along
> > > with
> > > > > partition, batching, and other metadata. topic name will be
> attached
> > > only
> > > > > when the user tries to serialize and deserialize the messageId
> which
> > > will
> > > > > be purely client side implementation and in other cases it will not
> > be
> > > > > transmitted to server. Also, partitioned topic's abstract concept
> for
> > > > user
> > > > > and messageID must be also remain abstract for users and users must
> > not
> > > > > know about different implementation of messageId and our goal is to
> > > > > maintain that abstraction without telling user about MessageIdImpl
> or
> > > > > TopicMessageIdImpl.
> > > > >
> > > > > Thanks,
> > > > > Rajan
> > > > >
> > > > >
> > > > > On Thu, May 11, 2023 at 7:29 AM Asaf Mesika <asaf.mesika@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Rajan,
> > > > > >
> > > > > > A few comments on the PIP as I couldn't understand it fully as
> some
> > > > > pieces
> > > > > > of information is missing.
> > > > > >
> > > > > > First, I would like to remind about the rules, that exists in the
> > > > > beginning
> > > > > > of the PIP template:
> > > > > >
> > > > > > <!--
> > > > > > RULES
> > > > > > * Never place a link to an external site like Google Doc. The
> > > proposal
> > > > > > should be in this issue entirely.
> > > > > > * Use a spelling and grammar checker tools if available for you
> > > (there
> > > > > are
> > > > > > plenty of free ones)
> > > > > >
> > > > > > PROPOSAL HEALTH CHECK
> > > > > > I can read the design document and understand the problem
> statement
> > > and
> > > > > > what you plan to change *without* resorting to a couple of hours
> of
> > > > code
> > > > > > reading just to start having a high level understanding of the
> > > change.
> > > > > > -->
> > > > > >
> > > > > >
> > > > > > In this specific case
> > > > > > 1. I would include explanation and detail the data structures
> > fields
> > > of
> > > > > > objects you mentioned, such as: MessageIdImpl and MessageIdData.
> > > > > > 2. I would not put a PR as the design section, so I need to read
> > code
> > > > to
> > > > > > understand what the exact solution details.
> > > > > >
> > > > > > You wrote:
> > > > > >
> > > > > > > Pulsar api provides MessageId
> > > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/MessageId.java
> > > > > >
> > > > > > interface
> > > > > > > which is generally used by producer and consumer applications
> to
> > > > manage
> > > > > > > topic offset.
> > > > > >
> > > > > >
> > > > > > I think it's used to allow consumers to acknowledge (can be per
> > > > message)
> > > > > so
> > > > > > offset if wrong terminology here.
> > > > > > For producers, not sure exactly its usage. Maybe if they need to
> > > refer
> > > > to
> > > > > > this message later when reading by Reader interface.
> > > > > > I would correct this section.
> > > > > >
> > > > > > However, right now Pulsar doesn't support correct deserialization
> > of
> > > > > > > multi-topic or partitioned-topic because of that 1acknowledge`
> > API
> > > > call
> > > > > > > fails for those topics with below error
> > > > > >
> > > > > >
> > > > > > You're saying that the acknowledgement API method signature
> > receives
> > > > > > MessageId, but do not receive TopicMessageId?
> > > > > >
> > > > > > I have a few questions on that:
> > > > > >
> > > > > > 1. The acknowledgement API is part of Pulsar binary protocol. Is
> > your
> > > > > plan
> > > > > > to alter that protocol so it will also include the topic field as
> > > part
> > > > of
> > > > > > the message ID?
> > > > > >
> > > > > > 2. I think your PIP needs to explain the following items which
> are
> > > > > missing
> > > > > > as context:
> > > > > > - There are two implementation for MessageId interface, one for
> > topic
> > > > and
> > > > > > one for partitioned topic.
> > > > > > - The problem is that the serialization/desrialization method is
> > used
> > > > > > mainly for translating the ID into the binary protocol, which
> only
> > > > > requires
> > > > > > the ID (ledger ID, entry ID).
> > > > > > - The reason for that is that once you created a consumer, it
> has a
> > > > topic
> > > > > > attached to it. Transferring the topic for the ack is redundant.
> > > > > >
> > > > > > All of this needs to be in the background.
> > > > > >
> > > > > > I have several ideas on solving that, which IMO should mainly be
> in
> > > the
> > > > > > client level, but I must get answers to the questions above
> before
> > I
> > > > can
> > > > > > continue.
> > > > > >
> > > > > > Last note
> > > > > > You have basically placed a link to a pull request as the design
> > > > solution
> > > > > > (high-level/detailed design).
> > > > > > The whole idea of the design is that you describe the solution
> > > without
> > > > > > resorting to code.
> > > > > > IMO you should amend the design, state the goal shortly, and
> have a
> > > > high
> > > > > > level design section which contains 1-2 short paragraphs
> describing
> > > > > exactly
> > > > > > your solution.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Asaf
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, May 9, 2023 at 3:24 PM Yunze Xu <xy...@apache.org> wrote:
> > > > > >
> > > > > > > I'm talking about whether to add a new separate API. I'm
> > concerned
> > > > > > > about whether existing applications would be affected, no
> matter
> > if
> > > > > > > the existing implementation has the limitation. If yes, we
> should
> > > > > > > document it in the PIP so that users can know that.
> > > > > > >
> > > > > > > > it's a new optional field which would not break the
> > compatibility
> > > > > > >
> > > > > > > And yes, I just confirmed it with simple demos in my local env.
> > So
> > > > I'm
> > > > > > > +1 to this proposal.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yunze
> > > > > > >
> > > > > > > On Tue, May 9, 2023 at 3:05 PM Rajan Dhabalia <
> > > rdhabalia@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Weill there are multiple things: it's a new optional field
> > which
> > > > > would
> > > > > > > not
> > > > > > > > break the compatibility , also current messaegId
> serialization
> > > and
> > > > > > > > deserialization anyway only impact multi-topic consumer which
> > is
> > > > > > already
> > > > > > > > broken or has the limitation and, adding a new separate API
> for
> > > > > > > partitioned
> > > > > > > > topic is not only not acceptable but creates too much
> confusion
> > > for
> > > > > > users
> > > > > > > > to use separate ack APIs for non-partition and partition
> topics
> > > and
> > > > > > that
> > > > > > > > also breaks partitioned topic abstraction which we would like
> > to
> > > > > avoid.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Rajan
> > > > > > > >
> > > > > > > > On Mon, May 8, 2023 at 11:27 PM Yunze Xu <xy...@apache.org>
> > wrote:
> > > > > > > >
> > > > > > > > > It seems that `TopicMessageIdImpl#toByteArray` now could
> > > > serialize
> > > > > > the
> > > > > > > > > optional topic field to the bytes. I didn't test it now
> but I
> > > > have
> > > > > a
> > > > > > > > > concern about whether it would bring a breaking change.
> > > > > > > > >
> > > > > > > > > Assuming there are two applications (let's say A and B)
> based
> > > on
> > > > an
> > > > > > > > > older Pulsar client, A writes serialized bytes into a
> file, B
> > > > reads
> > > > > > > > > bytes from the file and parses it to a MessageId. If A
> > upgraded
> > > > its
> > > > > > > > > Pulsar client to the latest while B did not, what would
> > happen?
> > > > > Could
> > > > > > > > > B still get the correct MessageId or the bytes would not be
> > > able
> > > > to
> > > > > > > > > parsed?
> > > > > > > > >
> > > > > > > > > P.S. it's better to add the API changes and potential
> > breaking
> > > > > > changes
> > > > > > > > > in the proposal.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Yunze
> > > > > > > > >
> > > > > > > > > On Tue, May 9, 2023 at 1:59 PM Rajan Dhabalia <
> > > > > rdhabalia@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > Pulsar api provides MessageId interface which is
> generally
> > > used
> > > > > by
> > > > > > > > > producer
> > > > > > > > > > and consumer applications to manage topic offset.
> > Sometimes,
> > > > > these
> > > > > > > > > > applications would like to serialize and deserialize
> > > > messageIds,
> > > > > > > > > > specifically consumer app which would like to persist
> > > messageId
> > > > > and
> > > > > > > ack
> > > > > > > > > > with those messageIds by deserializing them. However,
> right
> > > now
> > > > > > > Pulsar
> > > > > > > > > > doesn't support correct deserialization of multi-topic or
> > > > > > > > > partitioned-topic
> > > > > > > > > > because of that 1acknowledge` API call fails for those
> > topics
> > > > > with
> > > > > > > below
> > > > > > > > > > error:
> > > > > > > > > > "Only TopicMessageId is allowed to acknowledge for a
> > > > multi-topics
> > > > > > > > > consumer"
> > > > > > > > > >
> > > > > > > > > > MessageIdImpl stores id metadata into MessageIdData which
> > > > doesn't
> > > > > > > contain
> > > > > > > > > > context about topic name to find out which topic belongs
> to
> > > > this
> > > > > > > > > MessageID.
> > > > > > > > > > Therefore, we need to add topic-name into MessageIdData
> and
> > > > allow
> > > > > > > > > > multi-topic/partitioned topics to deserialize messages
> > > > correctly
> > > > > > so,
> > > > > > > > > > consumer app can perform as expected.
> > > > > > > > > >
> > > > > > > > > > Please visit PIP for any suggestions:
> > > > > > > > > > https://github.com/apache/pulsar/issues/20221
> > > > > > > > > >
> > > > > > > > > > This PIP is created with PR:
> > > > > > > https://github.com/apache/pulsar/pull/19944
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Rajan
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-267: Support multi-topic messageId deserialization to ack messages

Posted by Asaf Mesika <as...@gmail.com>.
First, let me add some data that should be added to the Background section
of the PIP since I had to reverse engineer the code to understand that,
which is the opposite of the goal of a design document.

----
Pulsar Broker has a binary protocol, which allows the client to consume
messages, acknowledge them, and much more. The protocol comprises Commands
containing the data needed to apply that Command on the broker side. Many
commands allow a consumer (client) to acknowledge messages, among them:
CommandSendReceipt, CommandSend, CommandAck, and more. All those commands
use the message type MessageIdData to specify the details of the message to
acknowledge.

Here's what this data structure looks like:
message MessageIdData {
required uint64 ledgerId = 1;
required uint64 entryId = 2;
optional int32 partition = 3 [default = -1];
optional int32 batch_index = 4 [default = -1];
repeated int64 ack_set = 5;
optional int32 batch_size = 6;

// For the chunk message id, we need to specify the first chunk message id.
optional MessageIdData first_chunk_message_id = 7;
}

The key fields are the ledgerID at which the message is contained and
entryId, which indicates the offset inside the ledger (message number).

The client uses a class named MessageIdData which is the auto-generated
code representing the message MessageIdData.
---------

Now, in the design, you wrote:

> Thefore, we need to add topic-name into MessageIdData and allow
> multi-topic/partitioned topic to deserialize message correctly so, API like
> acknowledge can perform as expected.


So you say in that sentence that you will add the topic name into
MessageIdData.
MessageIdData is defined in PulsarApi.proto and is transferred over the
wire, so how can you add the topic to this class without changing the wire
protocol?





On Fri, Jun 16, 2023 at 10:47 PM Rajan Dhabalia <rd...@apache.org>
wrote:

> Yes, the topic name will not be transferred and it's not part of the wire
> protocol. Message uses MessageID protobuf data-structure to serialize and
> deserialize MessageId and it doesn't change any behavior nor will transfer
> any additional fields to the broker. and I would not like to introduce any
> additional data-structure as that will create data copy, field
> inconsistencies, and more garbage due to more object allocation and that's
> something we would like to avoid.
>
> Thanks,
> Rajan
>
> On Mon, May 15, 2023 at 6:18 AM PengHui Li <pe...@apache.org> wrote:
>
> > I think the topic name will not be transmitted to the broker.
> > The client side used the class generated by the protobuf message.
> > Or, we can create another class to avoid coupling issues, but it
> > will introduce more changes and copy data from one structure
> > to another. For the long-term, I think it should be a good way if
> > we don't have blockers with this solution. Because I don't think
> > there is a higher priority in the long run than keeping the protocol
> clear.
> >
> > If the above options are not feasible. At least, we should clarify
> > it in the proposal and add comments in the proto file to avoid
> > other clients transmitting the topic name to the broker.
> >
> > Thanks,
> > Penghui
> >
> > On Fri, May 12, 2023 at 5:59 PM Asaf Mesika <as...@gmail.com>
> wrote:
> >
> > > I don't get it - you say msgId is a data structure contained within
> > > MessageId implementation, right? I presume msgId is the data structure
> > the
> > > client transmit to the server, so that means you are transmitting topic
> > to
> > > the server?
> > >
> > >
> > > On Fri, May 12, 2023 at 7:45 AM Rajan Dhabalia <rd...@apache.org>
> > > wrote:
> > >
> > > > Thank you for sharing your knowledge about the PIP which should be
> > > created
> > > > before PR and I think everyone in the community knows about it. but
> you
> > > can
> > > > check the PR for context which was blocked for sometime and we
> decided
> > to
> > > > create PIP with proto changes.
> > > >
> > > > This PIP/PR tries to fix the issue where partitioned topic fails
> while
> > > > acking deserialized messageId. topic name will be part of MsgIdData
> > which
> > > > is the data-structure used by messageID to store msgID context along
> > with
> > > > partition, batching, and other metadata. topic name will be attached
> > only
> > > > when the user tries to serialize and deserialize the messageId which
> > will
> > > > be purely client side implementation and in other cases it will not
> be
> > > > transmitted to server. Also, partitioned topic's abstract concept for
> > > user
> > > > and messageID must be also remain abstract for users and users must
> not
> > > > know about different implementation of messageId and our goal is to
> > > > maintain that abstraction without telling user about MessageIdImpl or
> > > > TopicMessageIdImpl.
> > > >
> > > > Thanks,
> > > > Rajan
> > > >
> > > >
> > > > On Thu, May 11, 2023 at 7:29 AM Asaf Mesika <as...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Rajan,
> > > > >
> > > > > A few comments on the PIP as I couldn't understand it fully as some
> > > > pieces
> > > > > of information is missing.
> > > > >
> > > > > First, I would like to remind about the rules, that exists in the
> > > > beginning
> > > > > of the PIP template:
> > > > >
> > > > > <!--
> > > > > RULES
> > > > > * Never place a link to an external site like Google Doc. The
> > proposal
> > > > > should be in this issue entirely.
> > > > > * Use a spelling and grammar checker tools if available for you
> > (there
> > > > are
> > > > > plenty of free ones)
> > > > >
> > > > > PROPOSAL HEALTH CHECK
> > > > > I can read the design document and understand the problem statement
> > and
> > > > > what you plan to change *without* resorting to a couple of hours of
> > > code
> > > > > reading just to start having a high level understanding of the
> > change.
> > > > > -->
> > > > >
> > > > >
> > > > > In this specific case
> > > > > 1. I would include explanation and detail the data structures
> fields
> > of
> > > > > objects you mentioned, such as: MessageIdImpl and MessageIdData.
> > > > > 2. I would not put a PR as the design section, so I need to read
> code
> > > to
> > > > > understand what the exact solution details.
> > > > >
> > > > > You wrote:
> > > > >
> > > > > > Pulsar api provides MessageId
> > > > > > <
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/MessageId.java
> > > > >
> > > > > interface
> > > > > > which is generally used by producer and consumer applications to
> > > manage
> > > > > > topic offset.
> > > > >
> > > > >
> > > > > I think it's used to allow consumers to acknowledge (can be per
> > > message)
> > > > so
> > > > > offset if wrong terminology here.
> > > > > For producers, not sure exactly its usage. Maybe if they need to
> > refer
> > > to
> > > > > this message later when reading by Reader interface.
> > > > > I would correct this section.
> > > > >
> > > > > However, right now Pulsar doesn't support correct deserialization
> of
> > > > > > multi-topic or partitioned-topic because of that 1acknowledge`
> API
> > > call
> > > > > > fails for those topics with below error
> > > > >
> > > > >
> > > > > You're saying that the acknowledgement API method signature
> receives
> > > > > MessageId, but do not receive TopicMessageId?
> > > > >
> > > > > I have a few questions on that:
> > > > >
> > > > > 1. The acknowledgement API is part of Pulsar binary protocol. Is
> your
> > > > plan
> > > > > to alter that protocol so it will also include the topic field as
> > part
> > > of
> > > > > the message ID?
> > > > >
> > > > > 2. I think your PIP needs to explain the following items which are
> > > > missing
> > > > > as context:
> > > > > - There are two implementation for MessageId interface, one for
> topic
> > > and
> > > > > one for partitioned topic.
> > > > > - The problem is that the serialization/desrialization method is
> used
> > > > > mainly for translating the ID into the binary protocol, which only
> > > > requires
> > > > > the ID (ledger ID, entry ID).
> > > > > - The reason for that is that once you created a consumer, it has a
> > > topic
> > > > > attached to it. Transferring the topic for the ack is redundant.
> > > > >
> > > > > All of this needs to be in the background.
> > > > >
> > > > > I have several ideas on solving that, which IMO should mainly be in
> > the
> > > > > client level, but I must get answers to the questions above before
> I
> > > can
> > > > > continue.
> > > > >
> > > > > Last note
> > > > > You have basically placed a link to a pull request as the design
> > > solution
> > > > > (high-level/detailed design).
> > > > > The whole idea of the design is that you describe the solution
> > without
> > > > > resorting to code.
> > > > > IMO you should amend the design, state the goal shortly, and have a
> > > high
> > > > > level design section which contains 1-2 short paragraphs describing
> > > > exactly
> > > > > your solution.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Asaf
> > > > >
> > > > >
> > > > >
> > > > > On Tue, May 9, 2023 at 3:24 PM Yunze Xu <xy...@apache.org> wrote:
> > > > >
> > > > > > I'm talking about whether to add a new separate API. I'm
> concerned
> > > > > > about whether existing applications would be affected, no matter
> if
> > > > > > the existing implementation has the limitation. If yes, we should
> > > > > > document it in the PIP so that users can know that.
> > > > > >
> > > > > > > it's a new optional field which would not break the
> compatibility
> > > > > >
> > > > > > And yes, I just confirmed it with simple demos in my local env.
> So
> > > I'm
> > > > > > +1 to this proposal.
> > > > > >
> > > > > > Thanks,
> > > > > > Yunze
> > > > > >
> > > > > > On Tue, May 9, 2023 at 3:05 PM Rajan Dhabalia <
> > rdhabalia@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > Weill there are multiple things: it's a new optional field
> which
> > > > would
> > > > > > not
> > > > > > > break the compatibility , also current messaegId serialization
> > and
> > > > > > > deserialization anyway only impact multi-topic consumer which
> is
> > > > > already
> > > > > > > broken or has the limitation and, adding a new separate API for
> > > > > > partitioned
> > > > > > > topic is not only not acceptable but creates too much confusion
> > for
> > > > > users
> > > > > > > to use separate ack APIs for non-partition and partition topics
> > and
> > > > > that
> > > > > > > also breaks partitioned topic abstraction which we would like
> to
> > > > avoid.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Rajan
> > > > > > >
> > > > > > > On Mon, May 8, 2023 at 11:27 PM Yunze Xu <xy...@apache.org>
> wrote:
> > > > > > >
> > > > > > > > It seems that `TopicMessageIdImpl#toByteArray` now could
> > > serialize
> > > > > the
> > > > > > > > optional topic field to the bytes. I didn't test it now but I
> > > have
> > > > a
> > > > > > > > concern about whether it would bring a breaking change.
> > > > > > > >
> > > > > > > > Assuming there are two applications (let's say A and B) based
> > on
> > > an
> > > > > > > > older Pulsar client, A writes serialized bytes into a file, B
> > > reads
> > > > > > > > bytes from the file and parses it to a MessageId. If A
> upgraded
> > > its
> > > > > > > > Pulsar client to the latest while B did not, what would
> happen?
> > > > Could
> > > > > > > > B still get the correct MessageId or the bytes would not be
> > able
> > > to
> > > > > > > > parsed?
> > > > > > > >
> > > > > > > > P.S. it's better to add the API changes and potential
> breaking
> > > > > changes
> > > > > > > > in the proposal.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Yunze
> > > > > > > >
> > > > > > > > On Tue, May 9, 2023 at 1:59 PM Rajan Dhabalia <
> > > > rdhabalia@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Pulsar api provides MessageId interface which is generally
> > used
> > > > by
> > > > > > > > producer
> > > > > > > > > and consumer applications to manage topic offset.
> Sometimes,
> > > > these
> > > > > > > > > applications would like to serialize and deserialize
> > > messageIds,
> > > > > > > > > specifically consumer app which would like to persist
> > messageId
> > > > and
> > > > > > ack
> > > > > > > > > with those messageIds by deserializing them. However, right
> > now
> > > > > > Pulsar
> > > > > > > > > doesn't support correct deserialization of multi-topic or
> > > > > > > > partitioned-topic
> > > > > > > > > because of that 1acknowledge` API call fails for those
> topics
> > > > with
> > > > > > below
> > > > > > > > > error:
> > > > > > > > > "Only TopicMessageId is allowed to acknowledge for a
> > > multi-topics
> > > > > > > > consumer"
> > > > > > > > >
> > > > > > > > > MessageIdImpl stores id metadata into MessageIdData which
> > > doesn't
> > > > > > contain
> > > > > > > > > context about topic name to find out which topic belongs to
> > > this
> > > > > > > > MessageID.
> > > > > > > > > Therefore, we need to add topic-name into MessageIdData and
> > > allow
> > > > > > > > > multi-topic/partitioned topics to deserialize messages
> > > correctly
> > > > > so,
> > > > > > > > > consumer app can perform as expected.
> > > > > > > > >
> > > > > > > > > Please visit PIP for any suggestions:
> > > > > > > > > https://github.com/apache/pulsar/issues/20221
> > > > > > > > >
> > > > > > > > > This PIP is created with PR:
> > > > > > https://github.com/apache/pulsar/pull/19944
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Rajan
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>