You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Asaf Mesika <as...@gmail.com> on 2022/09/01 12:51:51 UTC

Re: [DISCUSS] Alternatives to changing public protocol

>
> As Penghui suggested, this field name is changed to `message_id` for
> potential generic usage. :)
>
> That's the thing - it's not really for potential generic use - it's more
for potential *internal* generic usage, which is publicly exposed.
When some outside visitor looks at the API and asks himself - "why should I
provide a message ID for a message I'm publishing? Isn't ID something the
broker creates for itself?"  - This creates confusion, which leads IMO to
less adoption and makes it harder to contribute. I'm quite new to Pulsar,
and I feel that there is confusion in quite numerous parts of the system.
My suggestion is raised here to try to avoid that confusion.


> > The second problem is clients: Every such field will eventually trickle
> > down to the clients, which will need to ignore that field. In my opinion,
> > it makes it harder for the client's maintainers. Especially when the
> > community goal is to expand and have many languages clients maintained by
> > the community
>
> Our current client's implementation is quite complex already. Comparing
> with this,
> ignoring a few fields does not seems to be a significant hard thing in
> this,
> as long as we document it well, right?
>
> Having internal fields makes the client even more complex. It's not just
about ignoring fields, it's about having more and more of them.
What I suggest is separating to an internal API and internal client for
those internal use cases. I'm not only referring to PIP-180, but to any
future PIP.


>
> > I believe someone who tries to reason about Pulsar, and its architecture,
> > by looking at its public API should not have any fields which will never
> be
> > relevant to the reader.  It makes it hard to reason and understand the
> > public API.
> >
>
> This design principle of keeping the public API clean is clear and easy to
> understand and I totally support this. But in the case of PIP-180 or
> geo-replication, the replicator can be considered as a special producer
> client, and it just inherited the basic semantic of a normal producer and
> extended its abilities to support some special internal usage.
>
> Of course we can use a different protocol and different port for strictly
> inter-broker communications in theory. But the side effect of this would be
> more codes, more machine resource usage, harder to maintain, and longer
> time to
> make the feature steady, comparing with just extending the abilities of
> producer client.
>
> If this come to a case that inter-broker communication is needed and it is
> not
> the case of producer or consumer, I think we should definitely consider to
> introduce the dedicated port and protocols.
>
>
Again, my suggestion mainly applies for the future - to make a
conscious decision to avoid overloading more internal use cases to the
public API.
PIP-180 is currently a good case study to explore that suggestion (well,
the ship has sailed, but it still is a good example).

I reiterate what I said before: You can say your sentence for any new
internal feature: "the X can be considered a special producer client , and
it just inherited the basic semantic of a normal producer and
extended its abilities to support some special internal usage". Replace X
with any feature, thereby expanding the public API more and more with
internal fields the normal user should never know about - the whole notion
of encapsulation and simplicity.

I would also like others to chime in on this and get their thoughts as well.


> On 2022/07/20 15:47:16 Asaf Mesika wrote:
> > Hi,
> >
> > We started discussing in PIP-180, which Penghui recommended I move to a
> > dedicated thread.
> >
> > Pulsar has a public API in its binary protocol, which the clients use to
> > communicate with it. Nonetheless, it is its public API to the server.
> >
> > I believe the public API should not be changed for internal communication
> > purposes. PIP-180 gives a really good example: We would like to
> introduce a
> > new feature called Shadow Topic and would like to replicate messages from
> > the source topic to the Shadow topic. It just so happens to be that the
> > replication mechanism uses the Broker public API to send messages to a
> > broker. The design would like to expand on that by adding a field to this
> > public API, to serve that specific feature needs (the field is not
> generic,
> > it's specifically named shadow_message_id).
> >
> > I believe someone who tries to reason about Pulsar, and its architecture,
> > by looking at its public API should not have any fields which will never
> be
> > relevant to the reader.  It makes it hard to reason and understand the
> > public API.
> >
> > The second problem is clients: Every such field will eventually trickle
> > down to the clients, which will need to ignore that field. In my opinion,
> > it makes it harder for the client's maintainers. Especially when the
> > community goal is to expand and have many languages clients maintained by
> > the community
> >
> > The public API today already contains many fields which are only for
> > internal use. Here are a few that I found (please correct me if I'm wrong
> > here):
> >
> > // Property set on replicated message,
> > // includes the source cluster name
> > optional string replicated_from = 5;
> >
> > // Override namespace's replication
> > repeated string replicate_to    = 7;
> >
> > // Identify whether a message is a "marker" message used for
> > // internal metadata instead of application published data.
> > // Markers will generally not be propagated back to clients
> > optional int32 marker_type = 20;
> >
> >
> > I would like to discuss that with you, get your feedback and whether you
> > think it's correct to accept a decision to avoid changing the public API.
> >
> > One alternative I was thinking about (I'm still fairly new, so I don't
> have
> > all the experience and context here) is creating an internal non-public
> > API, which will be used for internal communication: different proto,
> > different port.
> >
> > Thanks for your time,
> >
> > Asaf
> >
>
> On 2022/07/20 15:47:16 Asaf Mesika wrote:
> > Hi,
> >
> > We started discussing in PIP-180, which Penghui recommended I move to a
> > dedicated thread.
> >
> > Pulsar has a public API in its binary protocol, which the clients use to
> > communicate with it. Nonetheless, it is its public API to the server.
> >
> > I believe the public API should not be changed for internal communication
> > purposes. PIP-180 gives a really good example: We would like to
> introduce a
> > new feature called Shadow Topic and would like to replicate messages from
> > the source topic to the Shadow topic. It just so happens to be that the
> > replication mechanism uses the Broker public API to send messages to a
> > broker. The design would like to expand on that by adding a field to this
> > public API, to serve that specific feature needs (the field is not
> generic,
> > it's specifically named shadow_message_id).
> >
> > I believe someone who tries to reason about Pulsar, and its architecture,
> > by looking at its public API should not have any fields which will never
> be
> > relevant to the reader.  It makes it hard to reason and understand the
> > public API.
> >
> > The second problem is clients: Every such field will eventually trickle
> > down to the clients, which will need to ignore that field. In my opinion,
> > it makes it harder for the client's maintainers. Especially when the
> > community goal is to expand and have many languages clients maintained by
> > the community
> >
> > The public API today already contains many fields which are only for
> > internal use. Here are a few that I found (please correct me if I'm wrong
> > here):
> >
> > // Property set on replicated message,
> > // includes the source cluster name
> > optional string replicated_from = 5;
> >
> > // Override namespace's replication
> > repeated string replicate_to    = 7;
> >
> > // Identify whether a message is a "marker" message used for
> > // internal metadata instead of application published data.
> > // Markers will generally not be propagated back to clients
> > optional int32 marker_type = 20;
> >
> >
> > I would like to discuss that with you, get your feedback and whether you
> > think it's correct to accept a decision to avoid changing the public API.
> >
> > One alternative I was thinking about (I'm still fairly new, so I don't
> have
> > all the experience and context here) is creating an internal non-public
> > API, which will be used for internal communication: different proto,
> > different port.
> >
> > Thanks for your time,
> >
> > Asaf
> >
>