You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Michael Marshall <mi...@gmail.com> on 2021/04/20 21:13:22 UTC

[Discuss] PIP to add system topic for topic creation/deletion events

Hello all,

I would like to propose adding a new feature to Pulsar that will require a
PIP. In addition to feedback on the proposed feature, I am looking for
guidance on how to go about creating the PIP. Thanks for any help you can
provide.

I would like to add an optional system topic where topic creation and topic
deletion events are published. This feature will make it easier to leverage
the auto topic creation and inactive topic deletion features by providing a
way for users to reactively discover changes to topics. The largest benefit
is that users won't need to poll for these updates with an admin client.
Instead, they will get them as messages.

I looked to see if an equivalent feature already exists, but I don't see
one. For reference, the `PatternMultiTopicsConsumerImpl` currently polls
for all topics in a namespace and then does set operations to compute the
"new" topics to which it should subscribe. This client implementation could
possibly leverage the new feature.

There are still details I need to work out, like how it will work for
partitioned vs unpartitioned topics and what kind of guarantees we have
regarding messaging semantics (I think we'll want at least once message
delivery here). I plan to include these details in the PIP with discussions
about trade offs for different implementations.

Does this feature sound helpful and reasonable to others? If so, is the
next step to formally write a proposal in a Google Doc or to put together a
doc on the Pulsar GitHub Wiki?

Related and/or future work to consider in this design: I can see adding
different system topics for these types of auditable system events. We
currently rely on log lines as our primary way for end users to audit
system events, e.g. a producer connecting to a broker or a subscription
getting created, but we could instead have topics that represent streams of
these different kinds of events. A persistent topic could make these audit
events more durable and more structured which should lend themselves to
being more easily analyzed. Further, users could choose to turn on/off
these audit events, perhaps at the broker or namespace level, to fit their
own needs.

Let me know what you think and how I should proceed.

Regards,
Michael Marshall

Re: [E] Re: [Discuss] PIP to add system topic for topic creation/deletion events

Posted by Devin Bost <de...@gmail.com>.
>> Could we add a system topic that has exactly one partition per broker?

> I think this depends on which type of topic we use for the events.
> If it is
> nonpersistent, I think this approach would work because the events
wouldn't
> outlive the broker.

So, what happens to events in a nonpersistent topic when the broker goes
down? How would data loss be prevented in that scenario?

> However, if it is persistent, it would become
> problematic when a broker stops running because the topic would then need
> to be served by another broker in order to read from it.

What's the concern here? When the topic is unloaded, another broker will
pick up the data from the bookies.

--
Devin G. Bost

On Fri, Apr 23, 2021, 12:06 PM Michael Marshall <mi...@gmail.com>
wrote:

> Thanks for the clarification, Joe. I now see the nuance between the data
> and admin paths. One way to possibly remove these updates from the data
> flow is to make it a process that watches the metastore and sends metastore
> changes to a topic. That would remove it completely from the data path.
> However, there is still the problem of where that process gets scheduled
> and how to ensure that it is collocated with the topic to which it is
> publishing messages. Further, this paradigm could mean that some events
> might get missed, and it would put additional load on the metastore.
>
> > What are you hoping to accomplish by knowing when a topic is
> automatically
> > created?
>
> I am looking to solve several problems. First, I want to discover new
> topics so that I can create subscriptions for them. I use pulsar to buffer
> data, so using a simple regex consumer that discovers topics, creates
> subscriptions, and immediately consumes from those topics is not a viable
> solution. Also, I have an external database that exposes topic stats joined
> with relevant business metadata for each topic. I want to know when topics
> are deleted so I can update the database appropriately. I want to avoid any
> solution that requires polling pulsar's adin api.
>
> As Joe pointed out, there are many other potential cluster events that
> could be useful. This PIP could be more general than my initial proposal.
>
> > Could we add a system topic that has exactly one partition per broker?
>
> I think this depends on which type of topic we use for the events. If it is
> nonpersistent, I think this approach would work because the events wouldn't
> outlive the broker. However, if it is persistent, it would become
> problematic when a broker stops running because the topic would then need
> to be served by another broker in order to read from it.
>
> One alternative might be to put a system topic partition in each namespace
> bundle. Given that all topics exist within a bundle and bundles split but
> don't join, it would guarantee that events would be local to the target
> topic without needing to worry about joining event logs. This would require
> a change to how topics are put into bundles though, as they are currently
> assigned based on a hash of their name. This approach would only make sense
> for events that are specific to a namespace, like topic/subscription
> creation/deletion.
>
> We may need multiple strategies regarding topic placement for different
> types of audit events. For example, some broker events are not namespaced,
> and as such, they likely belong in the `pulsar/system` namespace.
> Namespaced events would make sense in their source namespace, much like the
> `__change_events` topic exists in each namespace where topic level policies
> are allowed.
>
> Perhaps I should put together a Google doc for this proposal to make it
> easier to collaborate on specific details. I can tell that there is
> interest in this feature and that it will require a careful design.
>
> Thanks for all of your feedback,
> Michael
>
>
> On Fri, Apr 23, 2021 at 7:29 AM Jonathan Ellis <jb...@gmail.com> wrote:
>
> > Could we add a system topic that has exactly one partition per broker?
> >
> > On Thu, Apr 22, 2021 at 11:22 PM Joe Francis
> <joef@verizonmedia.com.invalid
> > >
> > wrote:
> >
> > > To be clear, I would love to have this feature. But I would not use
> this
> > > feature if that means whenever a  broker that hosts a "system topic"
> has
> > a
> > > hiccup, it would  result in an outage for N other brokers. I run 100+
> > > brokers/million+  topics in a cluster (hence an "audit topic" would be
> > > wonderful for all kinds of purposes), and would not want an "system
> > topic"
> > > as the single point of failure.
> > >
> > > So you have to make this log local to the broker, or sacrifice the
> > > reliability of the log (best case log).  Local log has its advantages -
> > you
> > > can log a lot more about the system itself into it, (eg: security
> events
> > > like failed auth etc), but you will need to provide an aggregate view
> for
> > > the cluster as a whole from all the brokers
> > >
> > > Joe
> > >
> > >
> > >
> > >
> > > On Thu, Apr 22, 2021 at 6:10 AM Joe Francis <jo...@verizonmedia.com>
> > wrote:
> > >
> > > > Completely disagree that we have accepted this risk with PIP-39. That
> > is
> > > > different because it is an admin flow. A failure in a namespace
> policy
> > > > change does not affect data flow.
> > > >
> > > >  What you are proposing  is in the data path. Topics and subs are
> > > > created in the data flow path. Failure means outages. PIP-39 is not
> > going
> > > > to help you there.
> > > >
> > > > Joe
> > > >
> > > > On Wed, Apr 21, 2021 at 11:10 PM Michael Marshall <
> > mikemarsh17@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Hi Joe,
> > > >>
> > > >> I agree there is a risk in adding more interdependencies between
> > > brokers.
> > > >> I
> > > >> will point out that we have already accepted this risk with the
> > > >> implementation of PIP 39, which propagates namespace policy changes
> to
> > > >> other brokers using messages sent to a system topic. However, that
> > > doesn't
> > > >> necessarily mean we should build more interdependencies between
> > brokers.
> > > >>
> > > >> Here is the link to PIP 39:
> > > >>
> > > >>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_wiki_PIP-2D39-253A-2DNamespace-2DChange-2DEvents&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=G_K7a-seNfGGb-Z4Wy0Q5iMrbdL2j9WCoMUWfwUH5RY&e=
> > > >> .
> > > >>
> > > >> I will look into the implementation of PIP 39 to better understand
> its
> > > >> design, as I think it will likely influence this feature's design.
> > > >>
> > > >> Thanks,
> > > >> Michael
> > > >>
> > > >> On Wed, Apr 21, 2021 at 5:50 PM Joe F <jo...@gmail.com>
> wrote:
> > > >>
> > > >> > I would be very careful about implementing  such a feature,
> because
> > of
> > > >> > introducing  undesirable interdependencies. Broker processes only
> > talk
> > > >> to
> > > >> > the metadata store or data store. This keeps brokers isolated from
> > > each
> > > >> > other - one broker is not dependent on the functioning of another
> > > >> broker.
> > > >> >
> > > >> > A broker publishing to a topic hosted on another broker (which for
> > eg:
> > > >> is
> > > >> > serving "system topic"),  sets up an undesirable dependency,
> which
> > > >> reduces
> > > >> > total system resiliency and availability for the cluster. These
> are
> > > >> better
> > > >> > implemented as notifications off the metadata changes.
> > > >> >
> > > >> > Good feature, but needs careful thought to do it right
> > > >> > Joe
> > > >> >
> > > >> > On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall <
> > > mikemarsh17@gmail.com
> > > >> >
> > > >> > wrote:
> > > >> >
> > > >> > > Thanks for your response, PengHui.
> > > >> > >
> > > >> > > I think this feature would be useful to end users for cluster
> > > >> management,
> > > >> > > which is why I want to contribute a first class feature instead
> of
> > > >> > writing
> > > >> > > my own plugin that would add little value to the community.
> > > >> > >
> > > >> > > > With the broker interceptor you can intercept all the REST API
> > > >> request
> > > >> > > and response, Pulsar commands between the broker and clients.
> > > >> > >
> > > >> > > Based on looking through the interceptor trait, I don't see a
> way
> > to
> > > >> > > trigger events based on auto created/deleted topics. For
> example,
> > > >> when a
> > > >> > > producer connects to a broker for a nonexistent topic (assuming
> > auto
> > > >> > topic
> > > >> > > creation is allowed), a managed ledger, and thus a topic, is
> > created
> > > >> > > without ever interacting with that interceptor trait. The same
> > > >> appears to
> > > >> > > be true for garbage collected topics. I think we'll need more
> than
> > > >> this
> > > >> > > interceptor to properly capture all cases where topics are
> created
> > > or
> > > >> > > deleted.
> > > >> > >
> > > >> > > Regarding my reference to potential further work, it does appear
> > > that
> > > >> low
> > > >> > > level auditing of connections and pulsar commands could be
> covered
> > > by
> > > >> the
> > > >> > > interceptor. However, it would still be on the end user to
> > implement
> > > >> such
> > > >> > > functionality.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Michael
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Apr 21, 2021 at 3:51 AM PengHui Li <
> > codelipenghui@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi Michael,
> > > >> > > >
> > > >> > > > Currently, Pulsar supports a pluginable Broker Interceptor,
> you
> > > can
> > > >> > find
> > > >> > > > it here
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_blob_6704f12104219611164aa2bb5bbdfc929613f1bf_pulsar-2Dbroker_src_main_java_org_apache_pulsar_broker_intercept_BrokerInterceptor.java&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=6Li1guS8lImjrxPo9A0nnQAmDMnYEKHlAGqlVYvB8Ug&e=
> > > >> > > >
> > > >> > > > With the broker interceptor you can intercept all the REST API
> > > >> request
> > > >> > > and
> > > >> > > > response, Pulsar commands between the broker and clients.
> > > >> > > > This can be used to audit the system events.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Penghui
> > > >> > > > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <
> > > >> > mikemarsh17@gmail.com
> > > >> > > >,
> > > >> > > > wrote:
> > > >> > > > > Hello all,
> > > >> > > > >
> > > >> > > > > I would like to propose adding a new feature to Pulsar that
> > will
> > > >> > > require
> > > >> > > > a
> > > >> > > > > PIP. In addition to feedback on the proposed feature, I am
> > > looking
> > > >> > for
> > > >> > > > > guidance on how to go about creating the PIP. Thanks for any
> > > help
> > > >> you
> > > >> > > can
> > > >> > > > > provide.
> > > >> > > > >
> > > >> > > > > I would like to add an optional system topic where topic
> > > creation
> > > >> and
> > > >> > > > topic
> > > >> > > > > deletion events are published. This feature will make it
> > easier
> > > to
> > > >> > > > leverage
> > > >> > > > > the auto topic creation and inactive topic deletion features
> > by
> > > >> > > > providing a
> > > >> > > > > way for users to reactively discover changes to topics. The
> > > >> largest
> > > >> > > > benefit
> > > >> > > > > is that users won't need to poll for these updates with an
> > admin
> > > >> > > client.
> > > >> > > > > Instead, they will get them as messages.
> > > >> > > > >
> > > >> > > > > I looked to see if an equivalent feature already exists,
> but I
> > > >> don't
> > > >> > > see
> > > >> > > > > one. For reference, the `PatternMultiTopicsConsumerImpl`
> > > currently
> > > >> > > polls
> > > >> > > > > for all topics in a namespace and then does set operations
> to
> > > >> compute
> > > >> > > the
> > > >> > > > > "new" topics to which it should subscribe. This client
> > > >> implementation
> > > >> > > > could
> > > >> > > > > possibly leverage the new feature.
> > > >> > > > >
> > > >> > > > > There are still details I need to work out, like how it will
> > > work
> > > >> for
> > > >> > > > > partitioned vs unpartitioned topics and what kind of
> > guarantees
> > > we
> > > >> > have
> > > >> > > > > regarding messaging semantics (I think we'll want at least
> > once
> > > >> > message
> > > >> > > > > delivery here). I plan to include these details in the PIP
> > with
> > > >> > > > discussions
> > > >> > > > > about trade offs for different implementations.
> > > >> > > > >
> > > >> > > > > Does this feature sound helpful and reasonable to others? If
> > so,
> > > >> is
> > > >> > the
> > > >> > > > > next step to formally write a proposal in a Google Doc or to
> > put
> > > >> > > > together a
> > > >> > > > > doc on the Pulsar GitHub Wiki?
> > > >> > > > >
> > > >> > > > > Related and/or future work to consider in this design: I can
> > see
> > > >> > adding
> > > >> > > > > different system topics for these types of auditable system
> > > >> events.
> > > >> > We
> > > >> > > > > currently rely on log lines as our primary way for end users
> > to
> > > >> audit
> > > >> > > > > system events, e.g. a producer connecting to a broker or a
> > > >> > subscription
> > > >> > > > > getting created, but we could instead have topics that
> > represent
> > > >> > > streams
> > > >> > > > of
> > > >> > > > > these different kinds of events. A persistent topic could
> make
> > > >> these
> > > >> > > > audit
> > > >> > > > > events more durable and more structured which should lend
> > > >> themselves
> > > >> > to
> > > >> > > > > being more easily analyzed. Further, users could choose to
> > turn
> > > >> > on/off
> > > >> > > > > these audit events, perhaps at the broker or namespace
> level,
> > to
> > > >> fit
> > > >> > > > their
> > > >> > > > > own needs.
> > > >> > > > >
> > > >> > > > > Let me know what you think and how I should proceed.
> > > >> > > > >
> > > >> > > > > Regards,
> > > >> > > > > Michael Marshall
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
> >
>

Re: [E] Re: [Discuss] PIP to add system topic for topic creation/deletion events

Posted by Michael Marshall <mi...@gmail.com>.
Thanks for the clarification, Joe. I now see the nuance between the data
and admin paths. One way to possibly remove these updates from the data
flow is to make it a process that watches the metastore and sends metastore
changes to a topic. That would remove it completely from the data path.
However, there is still the problem of where that process gets scheduled
and how to ensure that it is collocated with the topic to which it is
publishing messages. Further, this paradigm could mean that some events
might get missed, and it would put additional load on the metastore.

> What are you hoping to accomplish by knowing when a topic is automatically
> created?

I am looking to solve several problems. First, I want to discover new
topics so that I can create subscriptions for them. I use pulsar to buffer
data, so using a simple regex consumer that discovers topics, creates
subscriptions, and immediately consumes from those topics is not a viable
solution. Also, I have an external database that exposes topic stats joined
with relevant business metadata for each topic. I want to know when topics
are deleted so I can update the database appropriately. I want to avoid any
solution that requires polling pulsar's adin api.

As Joe pointed out, there are many other potential cluster events that
could be useful. This PIP could be more general than my initial proposal.

> Could we add a system topic that has exactly one partition per broker?

I think this depends on which type of topic we use for the events. If it is
nonpersistent, I think this approach would work because the events wouldn't
outlive the broker. However, if it is persistent, it would become
problematic when a broker stops running because the topic would then need
to be served by another broker in order to read from it.

One alternative might be to put a system topic partition in each namespace
bundle. Given that all topics exist within a bundle and bundles split but
don't join, it would guarantee that events would be local to the target
topic without needing to worry about joining event logs. This would require
a change to how topics are put into bundles though, as they are currently
assigned based on a hash of their name. This approach would only make sense
for events that are specific to a namespace, like topic/subscription
creation/deletion.

We may need multiple strategies regarding topic placement for different
types of audit events. For example, some broker events are not namespaced,
and as such, they likely belong in the `pulsar/system` namespace.
Namespaced events would make sense in their source namespace, much like the
`__change_events` topic exists in each namespace where topic level policies
are allowed.

Perhaps I should put together a Google doc for this proposal to make it
easier to collaborate on specific details. I can tell that there is
interest in this feature and that it will require a careful design.

Thanks for all of your feedback,
Michael


On Fri, Apr 23, 2021 at 7:29 AM Jonathan Ellis <jb...@gmail.com> wrote:

> Could we add a system topic that has exactly one partition per broker?
>
> On Thu, Apr 22, 2021 at 11:22 PM Joe Francis <joef@verizonmedia.com.invalid
> >
> wrote:
>
> > To be clear, I would love to have this feature. But I would not use this
> > feature if that means whenever a  broker that hosts a "system topic" has
> a
> > hiccup, it would  result in an outage for N other brokers. I run 100+
> > brokers/million+  topics in a cluster (hence an "audit topic" would be
> > wonderful for all kinds of purposes), and would not want an "system
> topic"
> > as the single point of failure.
> >
> > So you have to make this log local to the broker, or sacrifice the
> > reliability of the log (best case log).  Local log has its advantages -
> you
> > can log a lot more about the system itself into it, (eg: security events
> > like failed auth etc), but you will need to provide an aggregate view for
> > the cluster as a whole from all the brokers
> >
> > Joe
> >
> >
> >
> >
> > On Thu, Apr 22, 2021 at 6:10 AM Joe Francis <jo...@verizonmedia.com>
> wrote:
> >
> > > Completely disagree that we have accepted this risk with PIP-39. That
> is
> > > different because it is an admin flow. A failure in a namespace policy
> > > change does not affect data flow.
> > >
> > >  What you are proposing  is in the data path. Topics and subs are
> > > created in the data flow path. Failure means outages. PIP-39 is not
> going
> > > to help you there.
> > >
> > > Joe
> > >
> > > On Wed, Apr 21, 2021 at 11:10 PM Michael Marshall <
> mikemarsh17@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi Joe,
> > >>
> > >> I agree there is a risk in adding more interdependencies between
> > brokers.
> > >> I
> > >> will point out that we have already accepted this risk with the
> > >> implementation of PIP 39, which propagates namespace policy changes to
> > >> other brokers using messages sent to a system topic. However, that
> > doesn't
> > >> necessarily mean we should build more interdependencies between
> brokers.
> > >>
> > >> Here is the link to PIP 39:
> > >>
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_wiki_PIP-2D39-253A-2DNamespace-2DChange-2DEvents&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=G_K7a-seNfGGb-Z4Wy0Q5iMrbdL2j9WCoMUWfwUH5RY&e=
> > >> .
> > >>
> > >> I will look into the implementation of PIP 39 to better understand its
> > >> design, as I think it will likely influence this feature's design.
> > >>
> > >> Thanks,
> > >> Michael
> > >>
> > >> On Wed, Apr 21, 2021 at 5:50 PM Joe F <jo...@gmail.com> wrote:
> > >>
> > >> > I would be very careful about implementing  such a feature, because
> of
> > >> > introducing  undesirable interdependencies. Broker processes only
> talk
> > >> to
> > >> > the metadata store or data store. This keeps brokers isolated from
> > each
> > >> > other - one broker is not dependent on the functioning of another
> > >> broker.
> > >> >
> > >> > A broker publishing to a topic hosted on another broker (which for
> eg:
> > >> is
> > >> > serving "system topic"),  sets up an undesirable dependency,  which
> > >> reduces
> > >> > total system resiliency and availability for the cluster. These are
> > >> better
> > >> > implemented as notifications off the metadata changes.
> > >> >
> > >> > Good feature, but needs careful thought to do it right
> > >> > Joe
> > >> >
> > >> > On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall <
> > mikemarsh17@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> > > Thanks for your response, PengHui.
> > >> > >
> > >> > > I think this feature would be useful to end users for cluster
> > >> management,
> > >> > > which is why I want to contribute a first class feature instead of
> > >> > writing
> > >> > > my own plugin that would add little value to the community.
> > >> > >
> > >> > > > With the broker interceptor you can intercept all the REST API
> > >> request
> > >> > > and response, Pulsar commands between the broker and clients.
> > >> > >
> > >> > > Based on looking through the interceptor trait, I don't see a way
> to
> > >> > > trigger events based on auto created/deleted topics. For example,
> > >> when a
> > >> > > producer connects to a broker for a nonexistent topic (assuming
> auto
> > >> > topic
> > >> > > creation is allowed), a managed ledger, and thus a topic, is
> created
> > >> > > without ever interacting with that interceptor trait. The same
> > >> appears to
> > >> > > be true for garbage collected topics. I think we'll need more than
> > >> this
> > >> > > interceptor to properly capture all cases where topics are created
> > or
> > >> > > deleted.
> > >> > >
> > >> > > Regarding my reference to potential further work, it does appear
> > that
> > >> low
> > >> > > level auditing of connections and pulsar commands could be covered
> > by
> > >> the
> > >> > > interceptor. However, it would still be on the end user to
> implement
> > >> such
> > >> > > functionality.
> > >> > >
> > >> > > Thanks,
> > >> > > Michael
> > >> > >
> > >> > >
> > >> > > On Wed, Apr 21, 2021 at 3:51 AM PengHui Li <
> codelipenghui@gmail.com
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Michael,
> > >> > > >
> > >> > > > Currently, Pulsar supports a pluginable Broker Interceptor, you
> > can
> > >> > find
> > >> > > > it here
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_blob_6704f12104219611164aa2bb5bbdfc929613f1bf_pulsar-2Dbroker_src_main_java_org_apache_pulsar_broker_intercept_BrokerInterceptor.java&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=6Li1guS8lImjrxPo9A0nnQAmDMnYEKHlAGqlVYvB8Ug&e=
> > >> > > >
> > >> > > > With the broker interceptor you can intercept all the REST API
> > >> request
> > >> > > and
> > >> > > > response, Pulsar commands between the broker and clients.
> > >> > > > This can be used to audit the system events.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Penghui
> > >> > > > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <
> > >> > mikemarsh17@gmail.com
> > >> > > >,
> > >> > > > wrote:
> > >> > > > > Hello all,
> > >> > > > >
> > >> > > > > I would like to propose adding a new feature to Pulsar that
> will
> > >> > > require
> > >> > > > a
> > >> > > > > PIP. In addition to feedback on the proposed feature, I am
> > looking
> > >> > for
> > >> > > > > guidance on how to go about creating the PIP. Thanks for any
> > help
> > >> you
> > >> > > can
> > >> > > > > provide.
> > >> > > > >
> > >> > > > > I would like to add an optional system topic where topic
> > creation
> > >> and
> > >> > > > topic
> > >> > > > > deletion events are published. This feature will make it
> easier
> > to
> > >> > > > leverage
> > >> > > > > the auto topic creation and inactive topic deletion features
> by
> > >> > > > providing a
> > >> > > > > way for users to reactively discover changes to topics. The
> > >> largest
> > >> > > > benefit
> > >> > > > > is that users won't need to poll for these updates with an
> admin
> > >> > > client.
> > >> > > > > Instead, they will get them as messages.
> > >> > > > >
> > >> > > > > I looked to see if an equivalent feature already exists, but I
> > >> don't
> > >> > > see
> > >> > > > > one. For reference, the `PatternMultiTopicsConsumerImpl`
> > currently
> > >> > > polls
> > >> > > > > for all topics in a namespace and then does set operations to
> > >> compute
> > >> > > the
> > >> > > > > "new" topics to which it should subscribe. This client
> > >> implementation
> > >> > > > could
> > >> > > > > possibly leverage the new feature.
> > >> > > > >
> > >> > > > > There are still details I need to work out, like how it will
> > work
> > >> for
> > >> > > > > partitioned vs unpartitioned topics and what kind of
> guarantees
> > we
> > >> > have
> > >> > > > > regarding messaging semantics (I think we'll want at least
> once
> > >> > message
> > >> > > > > delivery here). I plan to include these details in the PIP
> with
> > >> > > > discussions
> > >> > > > > about trade offs for different implementations.
> > >> > > > >
> > >> > > > > Does this feature sound helpful and reasonable to others? If
> so,
> > >> is
> > >> > the
> > >> > > > > next step to formally write a proposal in a Google Doc or to
> put
> > >> > > > together a
> > >> > > > > doc on the Pulsar GitHub Wiki?
> > >> > > > >
> > >> > > > > Related and/or future work to consider in this design: I can
> see
> > >> > adding
> > >> > > > > different system topics for these types of auditable system
> > >> events.
> > >> > We
> > >> > > > > currently rely on log lines as our primary way for end users
> to
> > >> audit
> > >> > > > > system events, e.g. a producer connecting to a broker or a
> > >> > subscription
> > >> > > > > getting created, but we could instead have topics that
> represent
> > >> > > streams
> > >> > > > of
> > >> > > > > these different kinds of events. A persistent topic could make
> > >> these
> > >> > > > audit
> > >> > > > > events more durable and more structured which should lend
> > >> themselves
> > >> > to
> > >> > > > > being more easily analyzed. Further, users could choose to
> turn
> > >> > on/off
> > >> > > > > these audit events, perhaps at the broker or namespace level,
> to
> > >> fit
> > >> > > > their
> > >> > > > > own needs.
> > >> > > > >
> > >> > > > > Let me know what you think and how I should proceed.
> > >> > > > >
> > >> > > > > Regards,
> > >> > > > > Michael Marshall
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>

Re: [E] Re: [Discuss] PIP to add system topic for topic creation/deletion events

Posted by Devin Bost <de...@gmail.com>.
> Could we add a system topic that has exactly one partition per broker?

Unfortunately, that just creates multiple single points of failure because
each partition has data that only exists on that partition. So, if any
partition fails, there's a gap, resulting in data loss.
--
Devin G. Bost

On Fri, Apr 23, 2021, 7:29 AM Jonathan Ellis <jb...@gmail.com> wrote:

> Could we add a system topic that has exactly one partition per broker?
>
> On Thu, Apr 22, 2021 at 11:22 PM Joe Francis <joef@verizonmedia.com.invalid
> >
> wrote:
>
> > To be clear, I would love to have this feature. But I would not use this
> > feature if that means whenever a  broker that hosts a "system topic" has
> a
> > hiccup, it would  result in an outage for N other brokers. I run 100+
> > brokers/million+  topics in a cluster (hence an "audit topic" would be
> > wonderful for all kinds of purposes), and would not want an "system
> topic"
> > as the single point of failure.
> >
> > So you have to make this log local to the broker, or sacrifice the
> > reliability of the log (best case log).  Local log has its advantages -
> you
> > can log a lot more about the system itself into it, (eg: security events
> > like failed auth etc), but you will need to provide an aggregate view for
> > the cluster as a whole from all the brokers
> >
> > Joe
> >
> >
> >
> >
> > On Thu, Apr 22, 2021 at 6:10 AM Joe Francis <jo...@verizonmedia.com>
> wrote:
> >
> > > Completely disagree that we have accepted this risk with PIP-39. That
> is
> > > different because it is an admin flow. A failure in a namespace policy
> > > change does not affect data flow.
> > >
> > >  What you are proposing  is in the data path. Topics and subs are
> > > created in the data flow path. Failure means outages. PIP-39 is not
> going
> > > to help you there.
> > >
> > > Joe
> > >
> > > On Wed, Apr 21, 2021 at 11:10 PM Michael Marshall <
> mikemarsh17@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi Joe,
> > >>
> > >> I agree there is a risk in adding more interdependencies between
> > brokers.
> > >> I
> > >> will point out that we have already accepted this risk with the
> > >> implementation of PIP 39, which propagates namespace policy changes to
> > >> other brokers using messages sent to a system topic. However, that
> > doesn't
> > >> necessarily mean we should build more interdependencies between
> brokers.
> > >>
> > >> Here is the link to PIP 39:
> > >>
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_wiki_PIP-2D39-253A-2DNamespace-2DChange-2DEvents&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=G_K7a-seNfGGb-Z4Wy0Q5iMrbdL2j9WCoMUWfwUH5RY&e=
> > >> .
> > >>
> > >> I will look into the implementation of PIP 39 to better understand its
> > >> design, as I think it will likely influence this feature's design.
> > >>
> > >> Thanks,
> > >> Michael
> > >>
> > >> On Wed, Apr 21, 2021 at 5:50 PM Joe F <jo...@gmail.com> wrote:
> > >>
> > >> > I would be very careful about implementing  such a feature, because
> of
> > >> > introducing  undesirable interdependencies. Broker processes only
> talk
> > >> to
> > >> > the metadata store or data store. This keeps brokers isolated from
> > each
> > >> > other - one broker is not dependent on the functioning of another
> > >> broker.
> > >> >
> > >> > A broker publishing to a topic hosted on another broker (which for
> eg:
> > >> is
> > >> > serving "system topic"),  sets up an undesirable dependency,  which
> > >> reduces
> > >> > total system resiliency and availability for the cluster. These are
> > >> better
> > >> > implemented as notifications off the metadata changes.
> > >> >
> > >> > Good feature, but needs careful thought to do it right
> > >> > Joe
> > >> >
> > >> > On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall <
> > mikemarsh17@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> > > Thanks for your response, PengHui.
> > >> > >
> > >> > > I think this feature would be useful to end users for cluster
> > >> management,
> > >> > > which is why I want to contribute a first class feature instead of
> > >> > writing
> > >> > > my own plugin that would add little value to the community.
> > >> > >
> > >> > > > With the broker interceptor you can intercept all the REST API
> > >> request
> > >> > > and response, Pulsar commands between the broker and clients.
> > >> > >
> > >> > > Based on looking through the interceptor trait, I don't see a way
> to
> > >> > > trigger events based on auto created/deleted topics. For example,
> > >> when a
> > >> > > producer connects to a broker for a nonexistent topic (assuming
> auto
> > >> > topic
> > >> > > creation is allowed), a managed ledger, and thus a topic, is
> created
> > >> > > without ever interacting with that interceptor trait. The same
> > >> appears to
> > >> > > be true for garbage collected topics. I think we'll need more than
> > >> this
> > >> > > interceptor to properly capture all cases where topics are created
> > or
> > >> > > deleted.
> > >> > >
> > >> > > Regarding my reference to potential further work, it does appear
> > that
> > >> low
> > >> > > level auditing of connections and pulsar commands could be covered
> > by
> > >> the
> > >> > > interceptor. However, it would still be on the end user to
> implement
> > >> such
> > >> > > functionality.
> > >> > >
> > >> > > Thanks,
> > >> > > Michael
> > >> > >
> > >> > >
> > >> > > On Wed, Apr 21, 2021 at 3:51 AM PengHui Li <
> codelipenghui@gmail.com
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Michael,
> > >> > > >
> > >> > > > Currently, Pulsar supports a pluginable Broker Interceptor, you
> > can
> > >> > find
> > >> > > > it here
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_blob_6704f12104219611164aa2bb5bbdfc929613f1bf_pulsar-2Dbroker_src_main_java_org_apache_pulsar_broker_intercept_BrokerInterceptor.java&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=6Li1guS8lImjrxPo9A0nnQAmDMnYEKHlAGqlVYvB8Ug&e=
> > >> > > >
> > >> > > > With the broker interceptor you can intercept all the REST API
> > >> request
> > >> > > and
> > >> > > > response, Pulsar commands between the broker and clients.
> > >> > > > This can be used to audit the system events.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Penghui
> > >> > > > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <
> > >> > mikemarsh17@gmail.com
> > >> > > >,
> > >> > > > wrote:
> > >> > > > > Hello all,
> > >> > > > >
> > >> > > > > I would like to propose adding a new feature to Pulsar that
> will
> > >> > > require
> > >> > > > a
> > >> > > > > PIP. In addition to feedback on the proposed feature, I am
> > looking
> > >> > for
> > >> > > > > guidance on how to go about creating the PIP. Thanks for any
> > help
> > >> you
> > >> > > can
> > >> > > > > provide.
> > >> > > > >
> > >> > > > > I would like to add an optional system topic where topic
> > creation
> > >> and
> > >> > > > topic
> > >> > > > > deletion events are published. This feature will make it
> easier
> > to
> > >> > > > leverage
> > >> > > > > the auto topic creation and inactive topic deletion features
> by
> > >> > > > providing a
> > >> > > > > way for users to reactively discover changes to topics. The
> > >> largest
> > >> > > > benefit
> > >> > > > > is that users won't need to poll for these updates with an
> admin
> > >> > > client.
> > >> > > > > Instead, they will get them as messages.
> > >> > > > >
> > >> > > > > I looked to see if an equivalent feature already exists, but I
> > >> don't
> > >> > > see
> > >> > > > > one. For reference, the `PatternMultiTopicsConsumerImpl`
> > currently
> > >> > > polls
> > >> > > > > for all topics in a namespace and then does set operations to
> > >> compute
> > >> > > the
> > >> > > > > "new" topics to which it should subscribe. This client
> > >> implementation
> > >> > > > could
> > >> > > > > possibly leverage the new feature.
> > >> > > > >
> > >> > > > > There are still details I need to work out, like how it will
> > work
> > >> for
> > >> > > > > partitioned vs unpartitioned topics and what kind of
> guarantees
> > we
> > >> > have
> > >> > > > > regarding messaging semantics (I think we'll want at least
> once
> > >> > message
> > >> > > > > delivery here). I plan to include these details in the PIP
> with
> > >> > > > discussions
> > >> > > > > about trade offs for different implementations.
> > >> > > > >
> > >> > > > > Does this feature sound helpful and reasonable to others? If
> so,
> > >> is
> > >> > the
> > >> > > > > next step to formally write a proposal in a Google Doc or to
> put
> > >> > > > together a
> > >> > > > > doc on the Pulsar GitHub Wiki?
> > >> > > > >
> > >> > > > > Related and/or future work to consider in this design: I can
> see
> > >> > adding
> > >> > > > > different system topics for these types of auditable system
> > >> events.
> > >> > We
> > >> > > > > currently rely on log lines as our primary way for end users
> to
> > >> audit
> > >> > > > > system events, e.g. a producer connecting to a broker or a
> > >> > subscription
> > >> > > > > getting created, but we could instead have topics that
> represent
> > >> > > streams
> > >> > > > of
> > >> > > > > these different kinds of events. A persistent topic could make
> > >> these
> > >> > > > audit
> > >> > > > > events more durable and more structured which should lend
> > >> themselves
> > >> > to
> > >> > > > > being more easily analyzed. Further, users could choose to
> turn
> > >> > on/off
> > >> > > > > these audit events, perhaps at the broker or namespace level,
> to
> > >> fit
> > >> > > > their
> > >> > > > > own needs.
> > >> > > > >
> > >> > > > > Let me know what you think and how I should proceed.
> > >> > > > >
> > >> > > > > Regards,
> > >> > > > > Michael Marshall
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>

Re: [E] Re: [Discuss] PIP to add system topic for topic creation/deletion events

Posted by Jonathan Ellis <jb...@gmail.com>.
Could we add a system topic that has exactly one partition per broker?

On Thu, Apr 22, 2021 at 11:22 PM Joe Francis <jo...@verizonmedia.com.invalid>
wrote:

> To be clear, I would love to have this feature. But I would not use this
> feature if that means whenever a  broker that hosts a "system topic" has a
> hiccup, it would  result in an outage for N other brokers. I run 100+
> brokers/million+  topics in a cluster (hence an "audit topic" would be
> wonderful for all kinds of purposes), and would not want an "system topic"
> as the single point of failure.
>
> So you have to make this log local to the broker, or sacrifice the
> reliability of the log (best case log).  Local log has its advantages - you
> can log a lot more about the system itself into it, (eg: security events
> like failed auth etc), but you will need to provide an aggregate view for
> the cluster as a whole from all the brokers
>
> Joe
>
>
>
>
> On Thu, Apr 22, 2021 at 6:10 AM Joe Francis <jo...@verizonmedia.com> wrote:
>
> > Completely disagree that we have accepted this risk with PIP-39. That is
> > different because it is an admin flow. A failure in a namespace policy
> > change does not affect data flow.
> >
> >  What you are proposing  is in the data path. Topics and subs are
> > created in the data flow path. Failure means outages. PIP-39 is not going
> > to help you there.
> >
> > Joe
> >
> > On Wed, Apr 21, 2021 at 11:10 PM Michael Marshall <mikemarsh17@gmail.com
> >
> > wrote:
> >
> >> Hi Joe,
> >>
> >> I agree there is a risk in adding more interdependencies between
> brokers.
> >> I
> >> will point out that we have already accepted this risk with the
> >> implementation of PIP 39, which propagates namespace policy changes to
> >> other brokers using messages sent to a system topic. However, that
> doesn't
> >> necessarily mean we should build more interdependencies between brokers.
> >>
> >> Here is the link to PIP 39:
> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_wiki_PIP-2D39-253A-2DNamespace-2DChange-2DEvents&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=G_K7a-seNfGGb-Z4Wy0Q5iMrbdL2j9WCoMUWfwUH5RY&e=
> >> .
> >>
> >> I will look into the implementation of PIP 39 to better understand its
> >> design, as I think it will likely influence this feature's design.
> >>
> >> Thanks,
> >> Michael
> >>
> >> On Wed, Apr 21, 2021 at 5:50 PM Joe F <jo...@gmail.com> wrote:
> >>
> >> > I would be very careful about implementing  such a feature, because of
> >> > introducing  undesirable interdependencies. Broker processes only talk
> >> to
> >> > the metadata store or data store. This keeps brokers isolated from
> each
> >> > other - one broker is not dependent on the functioning of another
> >> broker.
> >> >
> >> > A broker publishing to a topic hosted on another broker (which for eg:
> >> is
> >> > serving "system topic"),  sets up an undesirable dependency,  which
> >> reduces
> >> > total system resiliency and availability for the cluster. These are
> >> better
> >> > implemented as notifications off the metadata changes.
> >> >
> >> > Good feature, but needs careful thought to do it right
> >> > Joe
> >> >
> >> > On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall <
> mikemarsh17@gmail.com
> >> >
> >> > wrote:
> >> >
> >> > > Thanks for your response, PengHui.
> >> > >
> >> > > I think this feature would be useful to end users for cluster
> >> management,
> >> > > which is why I want to contribute a first class feature instead of
> >> > writing
> >> > > my own plugin that would add little value to the community.
> >> > >
> >> > > > With the broker interceptor you can intercept all the REST API
> >> request
> >> > > and response, Pulsar commands between the broker and clients.
> >> > >
> >> > > Based on looking through the interceptor trait, I don't see a way to
> >> > > trigger events based on auto created/deleted topics. For example,
> >> when a
> >> > > producer connects to a broker for a nonexistent topic (assuming auto
> >> > topic
> >> > > creation is allowed), a managed ledger, and thus a topic, is created
> >> > > without ever interacting with that interceptor trait. The same
> >> appears to
> >> > > be true for garbage collected topics. I think we'll need more than
> >> this
> >> > > interceptor to properly capture all cases where topics are created
> or
> >> > > deleted.
> >> > >
> >> > > Regarding my reference to potential further work, it does appear
> that
> >> low
> >> > > level auditing of connections and pulsar commands could be covered
> by
> >> the
> >> > > interceptor. However, it would still be on the end user to implement
> >> such
> >> > > functionality.
> >> > >
> >> > > Thanks,
> >> > > Michael
> >> > >
> >> > >
> >> > > On Wed, Apr 21, 2021 at 3:51 AM PengHui Li <codelipenghui@gmail.com
> >
> >> > > wrote:
> >> > >
> >> > > > Hi Michael,
> >> > > >
> >> > > > Currently, Pulsar supports a pluginable Broker Interceptor, you
> can
> >> > find
> >> > > > it here
> >> > > >
> >> > >
> >> >
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_blob_6704f12104219611164aa2bb5bbdfc929613f1bf_pulsar-2Dbroker_src_main_java_org_apache_pulsar_broker_intercept_BrokerInterceptor.java&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=6Li1guS8lImjrxPo9A0nnQAmDMnYEKHlAGqlVYvB8Ug&e=
> >> > > >
> >> > > > With the broker interceptor you can intercept all the REST API
> >> request
> >> > > and
> >> > > > response, Pulsar commands between the broker and clients.
> >> > > > This can be used to audit the system events.
> >> > > >
> >> > > > Thanks,
> >> > > > Penghui
> >> > > > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <
> >> > mikemarsh17@gmail.com
> >> > > >,
> >> > > > wrote:
> >> > > > > Hello all,
> >> > > > >
> >> > > > > I would like to propose adding a new feature to Pulsar that will
> >> > > require
> >> > > > a
> >> > > > > PIP. In addition to feedback on the proposed feature, I am
> looking
> >> > for
> >> > > > > guidance on how to go about creating the PIP. Thanks for any
> help
> >> you
> >> > > can
> >> > > > > provide.
> >> > > > >
> >> > > > > I would like to add an optional system topic where topic
> creation
> >> and
> >> > > > topic
> >> > > > > deletion events are published. This feature will make it easier
> to
> >> > > > leverage
> >> > > > > the auto topic creation and inactive topic deletion features by
> >> > > > providing a
> >> > > > > way for users to reactively discover changes to topics. The
> >> largest
> >> > > > benefit
> >> > > > > is that users won't need to poll for these updates with an admin
> >> > > client.
> >> > > > > Instead, they will get them as messages.
> >> > > > >
> >> > > > > I looked to see if an equivalent feature already exists, but I
> >> don't
> >> > > see
> >> > > > > one. For reference, the `PatternMultiTopicsConsumerImpl`
> currently
> >> > > polls
> >> > > > > for all topics in a namespace and then does set operations to
> >> compute
> >> > > the
> >> > > > > "new" topics to which it should subscribe. This client
> >> implementation
> >> > > > could
> >> > > > > possibly leverage the new feature.
> >> > > > >
> >> > > > > There are still details I need to work out, like how it will
> work
> >> for
> >> > > > > partitioned vs unpartitioned topics and what kind of guarantees
> we
> >> > have
> >> > > > > regarding messaging semantics (I think we'll want at least once
> >> > message
> >> > > > > delivery here). I plan to include these details in the PIP with
> >> > > > discussions
> >> > > > > about trade offs for different implementations.
> >> > > > >
> >> > > > > Does this feature sound helpful and reasonable to others? If so,
> >> is
> >> > the
> >> > > > > next step to formally write a proposal in a Google Doc or to put
> >> > > > together a
> >> > > > > doc on the Pulsar GitHub Wiki?
> >> > > > >
> >> > > > > Related and/or future work to consider in this design: I can see
> >> > adding
> >> > > > > different system topics for these types of auditable system
> >> events.
> >> > We
> >> > > > > currently rely on log lines as our primary way for end users to
> >> audit
> >> > > > > system events, e.g. a producer connecting to a broker or a
> >> > subscription
> >> > > > > getting created, but we could instead have topics that represent
> >> > > streams
> >> > > > of
> >> > > > > these different kinds of events. A persistent topic could make
> >> these
> >> > > > audit
> >> > > > > events more durable and more structured which should lend
> >> themselves
> >> > to
> >> > > > > being more easily analyzed. Further, users could choose to turn
> >> > on/off
> >> > > > > these audit events, perhaps at the broker or namespace level, to
> >> fit
> >> > > > their
> >> > > > > own needs.
> >> > > > >
> >> > > > > Let me know what you think and how I should proceed.
> >> > > > >
> >> > > > > Regards,
> >> > > > > Michael Marshall
> >> > > >
> >> > >
> >> >
> >>
> >
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Re: [E] Re: [Discuss] PIP to add system topic for topic creation/deletion events

Posted by Devin Bost <de...@gmail.com>.
>> > > Based on looking through the interceptor trait, I don't see a way to
>> > > trigger events based on auto created/deleted topics. For example,
>> when a
>> > > producer connects to a broker for a nonexistent topic (assuming auto
>> > topic
>> > > creation is allowed), a managed ledger, and thus a topic, is created
>> > > without ever interacting with that interceptor trait. The same
>> appears to
>> > > be true for garbage collected topics. I think we'll need more than
>> this
>> > > interceptor to properly capture all cases where topics are created or
>> > > deleted.

What are you hoping to accomplish by knowing when a topic is automatically
created? You're trying to determine when a topic is used for the first time
or when it hasn't been used after some period of time? I'm just looking for
some additional context to better understand the motivation. If you can
please provide some additional explanation of what you're trying to
accomplish (and why you're wanting this functionality), that will help us
zoom out and brainstorm the best path forward with the big picture in mind.

I'm not opposed to tapping into Pulsar’s metadata; personally, I think
there's a lot of power in a capability that will allow us to subscribe to
changes in the cluster. I think it could open additional doors for
intelligent automation, monitoring/alerting, and integration with other
services.

I agree with Joe that it needs to be done in a way that doesn't weaken
Pulsar’s guarantees. So, that will require some brainstorming and
evaluation of available options.

The way my team has gotten around some of this is we only have one way to
create topics/functions/etc. Basically, any time someone wants to create
something in our cluster, they publish to a topic that one of our services
(which we call "fast-deploy") listens to, and fast-deploy uses Pulsar’s
REST API to make the changes. To simplify the implementation, we created a
UI that automates producing messages to the topic ingested by fast-deploy.
When we provision a topic, we're typically issuing tokens to control
access, constructing a tenant and namespace for ingestion, a namespace in a
shared tenant for consumption, and creating a passthrough function that
sends the data between them. These tasks are all done through additional
services that send messages to fast-deploy. However, we're not attempting
to listen to topic auto-creation events. So, there's still a piece missing.

>> > A broker publishing to a topic hosted on another broker (which for eg:
>> is
>> > serving "system topic"),  sets up an undesirable dependency,  which
>> reduces
>> > total system resiliency and availability for the cluster

Is the concern here that a topic is owned by a single broker? It seems like
each broker would need it's own local topic that would get replicated
(maybe that's the wrong word here) to the other brokers. That data could be
shared through bookkeeper entries, and brokers could be notified of new
data by watching for changes on the metadata layer (e.g. on a ZK znode)
that indicate new entries have been written to the bookies. It would add
latency if everything needed to pass through the data layer, but I'm not
sure how data could safely be shared between brokers directly without using
a distributed consensus algorithm like Raft. That would be a huge
architectural change for Pulsar, and it wouldn't be without latency either.

--
Devin G. Bost

On Thu, Apr 22, 2021, 10:22 PM Joe Francis <jo...@verizonmedia.com.invalid>
wrote:

> To be clear, I would love to have this feature. But I would not use this
> feature if that means whenever a  broker that hosts a "system topic" has a
> hiccup, it would  result in an outage for N other brokers. I run 100+
> brokers/million+  topics in a cluster (hence an "audit topic" would be
> wonderful for all kinds of purposes), and would not want an "system topic"
> as the single point of failure.
>
> So you have to make this log local to the broker, or sacrifice the
> reliability of the log (best case log).  Local log has its advantages - you
> can log a lot more about the system itself into it, (eg: security events
> like failed auth etc), but you will need to provide an aggregate view for
> the cluster as a whole from all the brokers
>
> Joe
>
>
>
>
> On Thu, Apr 22, 2021 at 6:10 AM Joe Francis <jo...@verizonmedia.com> wrote:
>
> > Completely disagree that we have accepted this risk with PIP-39. That is
> > different because it is an admin flow. A failure in a namespace policy
> > change does not affect data flow.
> >
> >  What you are proposing  is in the data path. Topics and subs are
> > created in the data flow path. Failure means outages. PIP-39 is not going
> > to help you there.
> >
> > Joe
> >
> > On Wed, Apr 21, 2021 at 11:10 PM Michael Marshall <mikemarsh17@gmail.com
> >
> > wrote:
> >
> >> Hi Joe,
> >>
> >> I agree there is a risk in adding more interdependencies between
> brokers.
> >> I
> >> will point out that we have already accepted this risk with the
> >> implementation of PIP 39, which propagates namespace policy changes to
> >> other brokers using messages sent to a system topic. However, that
> doesn't
> >> necessarily mean we should build more interdependencies between brokers.
> >>
> >> Here is the link to PIP 39:
> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_wiki_PIP-2D39-253A-2DNamespace-2DChange-2DEvents&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=G_K7a-seNfGGb-Z4Wy0Q5iMrbdL2j9WCoMUWfwUH5RY&e=
> >> .
> >>
> >> I will look into the implementation of PIP 39 to better understand its
> >> design, as I think it will likely influence this feature's design.
> >>
> >> Thanks,
> >> Michael
> >>
> >> On Wed, Apr 21, 2021 at 5:50 PM Joe F <jo...@gmail.com> wrote:
> >>
> >> > I would be very careful about implementing  such a feature, because of
> >> > introducing  undesirable interdependencies. Broker processes only talk
> >> to
> >> > the metadata store or data store. This keeps brokers isolated from
> each
> >> > other - one broker is not dependent on the functioning of another
> >> broker.
> >> >
> >> > A broker publishing to a topic hosted on another broker (which for eg:
> >> is
> >> > serving "system topic"),  sets up an undesirable dependency,  which
> >> reduces
> >> > total system resiliency and availability for the cluster. These are
> >> better
> >> > implemented as notifications off the metadata changes.
> >> >
> >> > Good feature, but needs careful thought to do it right
> >> > Joe
> >> >
> >> > On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall <
> mikemarsh17@gmail.com
> >> >
> >> > wrote:
> >> >
> >> > > Thanks for your response, PengHui.
> >> > >
> >> > > I think this feature would be useful to end users for cluster
> >> management,
> >> > > which is why I want to contribute a first class feature instead of
> >> > writing
> >> > > my own plugin that would add little value to the community.
> >> > >
> >> > > > With the broker interceptor you can intercept all the REST API
> >> request
> >> > > and response, Pulsar commands between the broker and clients.
> >> > >
> >> > > Based on looking through the interceptor trait, I don't see a way to
> >> > > trigger events based on auto created/deleted topics. For example,
> >> when a
> >> > > producer connects to a broker for a nonexistent topic (assuming auto
> >> > topic
> >> > > creation is allowed), a managed ledger, and thus a topic, is created
> >> > > without ever interacting with that interceptor trait. The same
> >> appears to
> >> > > be true for garbage collected topics. I think we'll need more than
> >> this
> >> > > interceptor to properly capture all cases where topics are created
> or
> >> > > deleted.
> >> > >
> >> > > Regarding my reference to potential further work, it does appear
> that
> >> low
> >> > > level auditing of connections and pulsar commands could be covered
> by
> >> the
> >> > > interceptor. However, it would still be on the end user to implement
> >> such
> >> > > functionality.
> >> > >
> >> > > Thanks,
> >> > > Michael
> >> > >
> >> > >
> >> > > On Wed, Apr 21, 2021 at 3:51 AM PengHui Li <codelipenghui@gmail.com
> >
> >> > > wrote:
> >> > >
> >> > > > Hi Michael,
> >> > > >
> >> > > > Currently, Pulsar supports a pluginable Broker Interceptor, you
> can
> >> > find
> >> > > > it here
> >> > > >
> >> > >
> >> >
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_blob_6704f12104219611164aa2bb5bbdfc929613f1bf_pulsar-2Dbroker_src_main_java_org_apache_pulsar_broker_intercept_BrokerInterceptor.java&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=6Li1guS8lImjrxPo9A0nnQAmDMnYEKHlAGqlVYvB8Ug&e=
> >> > > >
> >> > > > With the broker interceptor you can intercept all the REST API
> >> request
> >> > > and
> >> > > > response, Pulsar commands between the broker and clients.
> >> > > > This can be used to audit the system events.
> >> > > >
> >> > > > Thanks,
> >> > > > Penghui
> >> > > > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <
> >> > mikemarsh17@gmail.com
> >> > > >,
> >> > > > wrote:
> >> > > > > Hello all,
> >> > > > >
> >> > > > > I would like to propose adding a new feature to Pulsar that will
> >> > > require
> >> > > > a
> >> > > > > PIP. In addition to feedback on the proposed feature, I am
> looking
> >> > for
> >> > > > > guidance on how to go about creating the PIP. Thanks for any
> help
> >> you
> >> > > can
> >> > > > > provide.
> >> > > > >
> >> > > > > I would like to add an optional system topic where topic
> creation
> >> and
> >> > > > topic
> >> > > > > deletion events are published. This feature will make it easier
> to
> >> > > > leverage
> >> > > > > the auto topic creation and inactive topic deletion features by
> >> > > > providing a
> >> > > > > way for users to reactively discover changes to topics. The
> >> largest
> >> > > > benefit
> >> > > > > is that users won't need to poll for these updates with an admin
> >> > > client.
> >> > > > > Instead, they will get them as messages.
> >> > > > >
> >> > > > > I looked to see if an equivalent feature already exists, but I
> >> don't
> >> > > see
> >> > > > > one. For reference, the `PatternMultiTopicsConsumerImpl`
> currently
> >> > > polls
> >> > > > > for all topics in a namespace and then does set operations to
> >> compute
> >> > > the
> >> > > > > "new" topics to which it should subscribe. This client
> >> implementation
> >> > > > could
> >> > > > > possibly leverage the new feature.
> >> > > > >
> >> > > > > There are still details I need to work out, like how it will
> work
> >> for
> >> > > > > partitioned vs unpartitioned topics and what kind of guarantees
> we
> >> > have
> >> > > > > regarding messaging semantics (I think we'll want at least once
> >> > message
> >> > > > > delivery here). I plan to include these details in the PIP with
> >> > > > discussions
> >> > > > > about trade offs for different implementations.
> >> > > > >
> >> > > > > Does this feature sound helpful and reasonable to others? If so,
> >> is
> >> > the
> >> > > > > next step to formally write a proposal in a Google Doc or to put
> >> > > > together a
> >> > > > > doc on the Pulsar GitHub Wiki?
> >> > > > >
> >> > > > > Related and/or future work to consider in this design: I can see
> >> > adding
> >> > > > > different system topics for these types of auditable system
> >> events.
> >> > We
> >> > > > > currently rely on log lines as our primary way for end users to
> >> audit
> >> > > > > system events, e.g. a producer connecting to a broker or a
> >> > subscription
> >> > > > > getting created, but we could instead have topics that represent
> >> > > streams
> >> > > > of
> >> > > > > these different kinds of events. A persistent topic could make
> >> these
> >> > > > audit
> >> > > > > events more durable and more structured which should lend
> >> themselves
> >> > to
> >> > > > > being more easily analyzed. Further, users could choose to turn
> >> > on/off
> >> > > > > these audit events, perhaps at the broker or namespace level, to
> >> fit
> >> > > > their
> >> > > > > own needs.
> >> > > > >
> >> > > > > Let me know what you think and how I should proceed.
> >> > > > >
> >> > > > > Regards,
> >> > > > > Michael Marshall
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [E] Re: [Discuss] PIP to add system topic for topic creation/deletion events

Posted by Joe Francis <jo...@verizonmedia.com.INVALID>.
To be clear, I would love to have this feature. But I would not use this
feature if that means whenever a  broker that hosts a "system topic" has a
hiccup, it would  result in an outage for N other brokers. I run 100+
brokers/million+  topics in a cluster (hence an "audit topic" would be
wonderful for all kinds of purposes), and would not want an "system topic"
as the single point of failure.

So you have to make this log local to the broker, or sacrifice the
reliability of the log (best case log).  Local log has its advantages - you
can log a lot more about the system itself into it, (eg: security events
like failed auth etc), but you will need to provide an aggregate view for
the cluster as a whole from all the brokers

Joe




On Thu, Apr 22, 2021 at 6:10 AM Joe Francis <jo...@verizonmedia.com> wrote:

> Completely disagree that we have accepted this risk with PIP-39. That is
> different because it is an admin flow. A failure in a namespace policy
> change does not affect data flow.
>
>  What you are proposing  is in the data path. Topics and subs are
> created in the data flow path. Failure means outages. PIP-39 is not going
> to help you there.
>
> Joe
>
> On Wed, Apr 21, 2021 at 11:10 PM Michael Marshall <mi...@gmail.com>
> wrote:
>
>> Hi Joe,
>>
>> I agree there is a risk in adding more interdependencies between brokers.
>> I
>> will point out that we have already accepted this risk with the
>> implementation of PIP 39, which propagates namespace policy changes to
>> other brokers using messages sent to a system topic. However, that doesn't
>> necessarily mean we should build more interdependencies between brokers.
>>
>> Here is the link to PIP 39:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_wiki_PIP-2D39-253A-2DNamespace-2DChange-2DEvents&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=G_K7a-seNfGGb-Z4Wy0Q5iMrbdL2j9WCoMUWfwUH5RY&e=
>> .
>>
>> I will look into the implementation of PIP 39 to better understand its
>> design, as I think it will likely influence this feature's design.
>>
>> Thanks,
>> Michael
>>
>> On Wed, Apr 21, 2021 at 5:50 PM Joe F <jo...@gmail.com> wrote:
>>
>> > I would be very careful about implementing  such a feature, because of
>> > introducing  undesirable interdependencies. Broker processes only talk
>> to
>> > the metadata store or data store. This keeps brokers isolated from each
>> > other - one broker is not dependent on the functioning of another
>> broker.
>> >
>> > A broker publishing to a topic hosted on another broker (which for eg:
>> is
>> > serving "system topic"),  sets up an undesirable dependency,  which
>> reduces
>> > total system resiliency and availability for the cluster. These are
>> better
>> > implemented as notifications off the metadata changes.
>> >
>> > Good feature, but needs careful thought to do it right
>> > Joe
>> >
>> > On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall <mikemarsh17@gmail.com
>> >
>> > wrote:
>> >
>> > > Thanks for your response, PengHui.
>> > >
>> > > I think this feature would be useful to end users for cluster
>> management,
>> > > which is why I want to contribute a first class feature instead of
>> > writing
>> > > my own plugin that would add little value to the community.
>> > >
>> > > > With the broker interceptor you can intercept all the REST API
>> request
>> > > and response, Pulsar commands between the broker and clients.
>> > >
>> > > Based on looking through the interceptor trait, I don't see a way to
>> > > trigger events based on auto created/deleted topics. For example,
>> when a
>> > > producer connects to a broker for a nonexistent topic (assuming auto
>> > topic
>> > > creation is allowed), a managed ledger, and thus a topic, is created
>> > > without ever interacting with that interceptor trait. The same
>> appears to
>> > > be true for garbage collected topics. I think we'll need more than
>> this
>> > > interceptor to properly capture all cases where topics are created or
>> > > deleted.
>> > >
>> > > Regarding my reference to potential further work, it does appear that
>> low
>> > > level auditing of connections and pulsar commands could be covered by
>> the
>> > > interceptor. However, it would still be on the end user to implement
>> such
>> > > functionality.
>> > >
>> > > Thanks,
>> > > Michael
>> > >
>> > >
>> > > On Wed, Apr 21, 2021 at 3:51 AM PengHui Li <co...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Michael,
>> > > >
>> > > > Currently, Pulsar supports a pluginable Broker Interceptor, you can
>> > find
>> > > > it here
>> > > >
>> > >
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_blob_6704f12104219611164aa2bb5bbdfc929613f1bf_pulsar-2Dbroker_src_main_java_org_apache_pulsar_broker_intercept_BrokerInterceptor.java&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=6Li1guS8lImjrxPo9A0nnQAmDMnYEKHlAGqlVYvB8Ug&e=
>> > > >
>> > > > With the broker interceptor you can intercept all the REST API
>> request
>> > > and
>> > > > response, Pulsar commands between the broker and clients.
>> > > > This can be used to audit the system events.
>> > > >
>> > > > Thanks,
>> > > > Penghui
>> > > > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <
>> > mikemarsh17@gmail.com
>> > > >,
>> > > > wrote:
>> > > > > Hello all,
>> > > > >
>> > > > > I would like to propose adding a new feature to Pulsar that will
>> > > require
>> > > > a
>> > > > > PIP. In addition to feedback on the proposed feature, I am looking
>> > for
>> > > > > guidance on how to go about creating the PIP. Thanks for any help
>> you
>> > > can
>> > > > > provide.
>> > > > >
>> > > > > I would like to add an optional system topic where topic creation
>> and
>> > > > topic
>> > > > > deletion events are published. This feature will make it easier to
>> > > > leverage
>> > > > > the auto topic creation and inactive topic deletion features by
>> > > > providing a
>> > > > > way for users to reactively discover changes to topics. The
>> largest
>> > > > benefit
>> > > > > is that users won't need to poll for these updates with an admin
>> > > client.
>> > > > > Instead, they will get them as messages.
>> > > > >
>> > > > > I looked to see if an equivalent feature already exists, but I
>> don't
>> > > see
>> > > > > one. For reference, the `PatternMultiTopicsConsumerImpl` currently
>> > > polls
>> > > > > for all topics in a namespace and then does set operations to
>> compute
>> > > the
>> > > > > "new" topics to which it should subscribe. This client
>> implementation
>> > > > could
>> > > > > possibly leverage the new feature.
>> > > > >
>> > > > > There are still details I need to work out, like how it will work
>> for
>> > > > > partitioned vs unpartitioned topics and what kind of guarantees we
>> > have
>> > > > > regarding messaging semantics (I think we'll want at least once
>> > message
>> > > > > delivery here). I plan to include these details in the PIP with
>> > > > discussions
>> > > > > about trade offs for different implementations.
>> > > > >
>> > > > > Does this feature sound helpful and reasonable to others? If so,
>> is
>> > the
>> > > > > next step to formally write a proposal in a Google Doc or to put
>> > > > together a
>> > > > > doc on the Pulsar GitHub Wiki?
>> > > > >
>> > > > > Related and/or future work to consider in this design: I can see
>> > adding
>> > > > > different system topics for these types of auditable system
>> events.
>> > We
>> > > > > currently rely on log lines as our primary way for end users to
>> audit
>> > > > > system events, e.g. a producer connecting to a broker or a
>> > subscription
>> > > > > getting created, but we could instead have topics that represent
>> > > streams
>> > > > of
>> > > > > these different kinds of events. A persistent topic could make
>> these
>> > > > audit
>> > > > > events more durable and more structured which should lend
>> themselves
>> > to
>> > > > > being more easily analyzed. Further, users could choose to turn
>> > on/off
>> > > > > these audit events, perhaps at the broker or namespace level, to
>> fit
>> > > > their
>> > > > > own needs.
>> > > > >
>> > > > > Let me know what you think and how I should proceed.
>> > > > >
>> > > > > Regards,
>> > > > > Michael Marshall
>> > > >
>> > >
>> >
>>
>

Re: [E] Re: [Discuss] PIP to add system topic for topic creation/deletion events

Posted by Joe Francis <jo...@verizonmedia.com.INVALID>.
Completely disagree that we have accepted this risk with PIP-39. That is
different because it is an admin flow. A failure in a namespace policy
change does not affect data flow.

 What you are proposing  is in the data path. Topics and subs are
created in the data flow path. Failure means outages. PIP-39 is not going
to help you there.

Joe

On Wed, Apr 21, 2021 at 11:10 PM Michael Marshall <mi...@gmail.com>
wrote:

> Hi Joe,
>
> I agree there is a risk in adding more interdependencies between brokers. I
> will point out that we have already accepted this risk with the
> implementation of PIP 39, which propagates namespace policy changes to
> other brokers using messages sent to a system topic. However, that doesn't
> necessarily mean we should build more interdependencies between brokers.
>
> Here is the link to PIP 39:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_wiki_PIP-2D39-253A-2DNamespace-2DChange-2DEvents&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=G_K7a-seNfGGb-Z4Wy0Q5iMrbdL2j9WCoMUWfwUH5RY&e=
> .
>
> I will look into the implementation of PIP 39 to better understand its
> design, as I think it will likely influence this feature's design.
>
> Thanks,
> Michael
>
> On Wed, Apr 21, 2021 at 5:50 PM Joe F <jo...@gmail.com> wrote:
>
> > I would be very careful about implementing  such a feature, because of
> > introducing  undesirable interdependencies. Broker processes only talk to
> > the metadata store or data store. This keeps brokers isolated from each
> > other - one broker is not dependent on the functioning of another broker.
> >
> > A broker publishing to a topic hosted on another broker (which for eg: is
> > serving "system topic"),  sets up an undesirable dependency,  which
> reduces
> > total system resiliency and availability for the cluster. These are
> better
> > implemented as notifications off the metadata changes.
> >
> > Good feature, but needs careful thought to do it right
> > Joe
> >
> > On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall <mi...@gmail.com>
> > wrote:
> >
> > > Thanks for your response, PengHui.
> > >
> > > I think this feature would be useful to end users for cluster
> management,
> > > which is why I want to contribute a first class feature instead of
> > writing
> > > my own plugin that would add little value to the community.
> > >
> > > > With the broker interceptor you can intercept all the REST API
> request
> > > and response, Pulsar commands between the broker and clients.
> > >
> > > Based on looking through the interceptor trait, I don't see a way to
> > > trigger events based on auto created/deleted topics. For example, when
> a
> > > producer connects to a broker for a nonexistent topic (assuming auto
> > topic
> > > creation is allowed), a managed ledger, and thus a topic, is created
> > > without ever interacting with that interceptor trait. The same appears
> to
> > > be true for garbage collected topics. I think we'll need more than this
> > > interceptor to properly capture all cases where topics are created or
> > > deleted.
> > >
> > > Regarding my reference to potential further work, it does appear that
> low
> > > level auditing of connections and pulsar commands could be covered by
> the
> > > interceptor. However, it would still be on the end user to implement
> such
> > > functionality.
> > >
> > > Thanks,
> > > Michael
> > >
> > >
> > > On Wed, Apr 21, 2021 at 3:51 AM PengHui Li <co...@gmail.com>
> > > wrote:
> > >
> > > > Hi Michael,
> > > >
> > > > Currently, Pulsar supports a pluginable Broker Interceptor, you can
> > find
> > > > it here
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_blob_6704f12104219611164aa2bb5bbdfc929613f1bf_pulsar-2Dbroker_src_main_java_org_apache_pulsar_broker_intercept_BrokerInterceptor.java&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=6Li1guS8lImjrxPo9A0nnQAmDMnYEKHlAGqlVYvB8Ug&e=
> > > >
> > > > With the broker interceptor you can intercept all the REST API
> request
> > > and
> > > > response, Pulsar commands between the broker and clients.
> > > > This can be used to audit the system events.
> > > >
> > > > Thanks,
> > > > Penghui
> > > > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <
> > mikemarsh17@gmail.com
> > > >,
> > > > wrote:
> > > > > Hello all,
> > > > >
> > > > > I would like to propose adding a new feature to Pulsar that will
> > > require
> > > > a
> > > > > PIP. In addition to feedback on the proposed feature, I am looking
> > for
> > > > > guidance on how to go about creating the PIP. Thanks for any help
> you
> > > can
> > > > > provide.
> > > > >
> > > > > I would like to add an optional system topic where topic creation
> and
> > > > topic
> > > > > deletion events are published. This feature will make it easier to
> > > > leverage
> > > > > the auto topic creation and inactive topic deletion features by
> > > > providing a
> > > > > way for users to reactively discover changes to topics. The largest
> > > > benefit
> > > > > is that users won't need to poll for these updates with an admin
> > > client.
> > > > > Instead, they will get them as messages.
> > > > >
> > > > > I looked to see if an equivalent feature already exists, but I
> don't
> > > see
> > > > > one. For reference, the `PatternMultiTopicsConsumerImpl` currently
> > > polls
> > > > > for all topics in a namespace and then does set operations to
> compute
> > > the
> > > > > "new" topics to which it should subscribe. This client
> implementation
> > > > could
> > > > > possibly leverage the new feature.
> > > > >
> > > > > There are still details I need to work out, like how it will work
> for
> > > > > partitioned vs unpartitioned topics and what kind of guarantees we
> > have
> > > > > regarding messaging semantics (I think we'll want at least once
> > message
> > > > > delivery here). I plan to include these details in the PIP with
> > > > discussions
> > > > > about trade offs for different implementations.
> > > > >
> > > > > Does this feature sound helpful and reasonable to others? If so, is
> > the
> > > > > next step to formally write a proposal in a Google Doc or to put
> > > > together a
> > > > > doc on the Pulsar GitHub Wiki?
> > > > >
> > > > > Related and/or future work to consider in this design: I can see
> > adding
> > > > > different system topics for these types of auditable system events.
> > We
> > > > > currently rely on log lines as our primary way for end users to
> audit
> > > > > system events, e.g. a producer connecting to a broker or a
> > subscription
> > > > > getting created, but we could instead have topics that represent
> > > streams
> > > > of
> > > > > these different kinds of events. A persistent topic could make
> these
> > > > audit
> > > > > events more durable and more structured which should lend
> themselves
> > to
> > > > > being more easily analyzed. Further, users could choose to turn
> > on/off
> > > > > these audit events, perhaps at the broker or namespace level, to
> fit
> > > > their
> > > > > own needs.
> > > > >
> > > > > Let me know what you think and how I should proceed.
> > > > >
> > > > > Regards,
> > > > > Michael Marshall
> > > >
> > >
> >
>

Re: [Discuss] PIP to add system topic for topic creation/deletion events

Posted by Michael Marshall <mi...@gmail.com>.
Hi Joe,

I agree there is a risk in adding more interdependencies between brokers. I
will point out that we have already accepted this risk with the
implementation of PIP 39, which propagates namespace policy changes to
other brokers using messages sent to a system topic. However, that doesn't
necessarily mean we should build more interdependencies between brokers.

Here is the link to PIP 39:
https://github.com/apache/pulsar/wiki/PIP-39%3A-Namespace-Change-Events.

I will look into the implementation of PIP 39 to better understand its
design, as I think it will likely influence this feature's design.

Thanks,
Michael

On Wed, Apr 21, 2021 at 5:50 PM Joe F <jo...@gmail.com> wrote:

> I would be very careful about implementing  such a feature, because of
> introducing  undesirable interdependencies. Broker processes only talk to
> the metadata store or data store. This keeps brokers isolated from each
> other - one broker is not dependent on the functioning of another broker.
>
> A broker publishing to a topic hosted on another broker (which for eg: is
> serving "system topic"),  sets up an undesirable dependency,  which reduces
> total system resiliency and availability for the cluster. These are better
> implemented as notifications off the metadata changes.
>
> Good feature, but needs careful thought to do it right
> Joe
>
> On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall <mi...@gmail.com>
> wrote:
>
> > Thanks for your response, PengHui.
> >
> > I think this feature would be useful to end users for cluster management,
> > which is why I want to contribute a first class feature instead of
> writing
> > my own plugin that would add little value to the community.
> >
> > > With the broker interceptor you can intercept all the REST API request
> > and response, Pulsar commands between the broker and clients.
> >
> > Based on looking through the interceptor trait, I don't see a way to
> > trigger events based on auto created/deleted topics. For example, when a
> > producer connects to a broker for a nonexistent topic (assuming auto
> topic
> > creation is allowed), a managed ledger, and thus a topic, is created
> > without ever interacting with that interceptor trait. The same appears to
> > be true for garbage collected topics. I think we'll need more than this
> > interceptor to properly capture all cases where topics are created or
> > deleted.
> >
> > Regarding my reference to potential further work, it does appear that low
> > level auditing of connections and pulsar commands could be covered by the
> > interceptor. However, it would still be on the end user to implement such
> > functionality.
> >
> > Thanks,
> > Michael
> >
> >
> > On Wed, Apr 21, 2021 at 3:51 AM PengHui Li <co...@gmail.com>
> > wrote:
> >
> > > Hi Michael,
> > >
> > > Currently, Pulsar supports a pluginable Broker Interceptor, you can
> find
> > > it here
> > >
> >
> https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-broker/src/main/java/org/apache/pulsar/broker/intercept/BrokerInterceptor.java
> > >
> > > With the broker interceptor you can intercept all the REST API request
> > and
> > > response, Pulsar commands between the broker and clients.
> > > This can be used to audit the system events.
> > >
> > > Thanks,
> > > Penghui
> > > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <
> mikemarsh17@gmail.com
> > >,
> > > wrote:
> > > > Hello all,
> > > >
> > > > I would like to propose adding a new feature to Pulsar that will
> > require
> > > a
> > > > PIP. In addition to feedback on the proposed feature, I am looking
> for
> > > > guidance on how to go about creating the PIP. Thanks for any help you
> > can
> > > > provide.
> > > >
> > > > I would like to add an optional system topic where topic creation and
> > > topic
> > > > deletion events are published. This feature will make it easier to
> > > leverage
> > > > the auto topic creation and inactive topic deletion features by
> > > providing a
> > > > way for users to reactively discover changes to topics. The largest
> > > benefit
> > > > is that users won't need to poll for these updates with an admin
> > client.
> > > > Instead, they will get them as messages.
> > > >
> > > > I looked to see if an equivalent feature already exists, but I don't
> > see
> > > > one. For reference, the `PatternMultiTopicsConsumerImpl` currently
> > polls
> > > > for all topics in a namespace and then does set operations to compute
> > the
> > > > "new" topics to which it should subscribe. This client implementation
> > > could
> > > > possibly leverage the new feature.
> > > >
> > > > There are still details I need to work out, like how it will work for
> > > > partitioned vs unpartitioned topics and what kind of guarantees we
> have
> > > > regarding messaging semantics (I think we'll want at least once
> message
> > > > delivery here). I plan to include these details in the PIP with
> > > discussions
> > > > about trade offs for different implementations.
> > > >
> > > > Does this feature sound helpful and reasonable to others? If so, is
> the
> > > > next step to formally write a proposal in a Google Doc or to put
> > > together a
> > > > doc on the Pulsar GitHub Wiki?
> > > >
> > > > Related and/or future work to consider in this design: I can see
> adding
> > > > different system topics for these types of auditable system events.
> We
> > > > currently rely on log lines as our primary way for end users to audit
> > > > system events, e.g. a producer connecting to a broker or a
> subscription
> > > > getting created, but we could instead have topics that represent
> > streams
> > > of
> > > > these different kinds of events. A persistent topic could make these
> > > audit
> > > > events more durable and more structured which should lend themselves
> to
> > > > being more easily analyzed. Further, users could choose to turn
> on/off
> > > > these audit events, perhaps at the broker or namespace level, to fit
> > > their
> > > > own needs.
> > > >
> > > > Let me know what you think and how I should proceed.
> > > >
> > > > Regards,
> > > > Michael Marshall
> > >
> >
>

Re: [Discuss] PIP to add system topic for topic creation/deletion events

Posted by Joe F <jo...@gmail.com>.
I would be very careful about implementing  such a feature, because of
introducing  undesirable interdependencies. Broker processes only talk to
the metadata store or data store. This keeps brokers isolated from each
other - one broker is not dependent on the functioning of another broker.

A broker publishing to a topic hosted on another broker (which for eg: is
serving "system topic"),  sets up an undesirable dependency,  which reduces
total system resiliency and availability for the cluster. These are better
implemented as notifications off the metadata changes.

Good feature, but needs careful thought to do it right
Joe

On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall <mi...@gmail.com>
wrote:

> Thanks for your response, PengHui.
>
> I think this feature would be useful to end users for cluster management,
> which is why I want to contribute a first class feature instead of writing
> my own plugin that would add little value to the community.
>
> > With the broker interceptor you can intercept all the REST API request
> and response, Pulsar commands between the broker and clients.
>
> Based on looking through the interceptor trait, I don't see a way to
> trigger events based on auto created/deleted topics. For example, when a
> producer connects to a broker for a nonexistent topic (assuming auto topic
> creation is allowed), a managed ledger, and thus a topic, is created
> without ever interacting with that interceptor trait. The same appears to
> be true for garbage collected topics. I think we'll need more than this
> interceptor to properly capture all cases where topics are created or
> deleted.
>
> Regarding my reference to potential further work, it does appear that low
> level auditing of connections and pulsar commands could be covered by the
> interceptor. However, it would still be on the end user to implement such
> functionality.
>
> Thanks,
> Michael
>
>
> On Wed, Apr 21, 2021 at 3:51 AM PengHui Li <co...@gmail.com>
> wrote:
>
> > Hi Michael,
> >
> > Currently, Pulsar supports a pluginable Broker Interceptor, you can find
> > it here
> >
> https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-broker/src/main/java/org/apache/pulsar/broker/intercept/BrokerInterceptor.java
> >
> > With the broker interceptor you can intercept all the REST API request
> and
> > response, Pulsar commands between the broker and clients.
> > This can be used to audit the system events.
> >
> > Thanks,
> > Penghui
> > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <mikemarsh17@gmail.com
> >,
> > wrote:
> > > Hello all,
> > >
> > > I would like to propose adding a new feature to Pulsar that will
> require
> > a
> > > PIP. In addition to feedback on the proposed feature, I am looking for
> > > guidance on how to go about creating the PIP. Thanks for any help you
> can
> > > provide.
> > >
> > > I would like to add an optional system topic where topic creation and
> > topic
> > > deletion events are published. This feature will make it easier to
> > leverage
> > > the auto topic creation and inactive topic deletion features by
> > providing a
> > > way for users to reactively discover changes to topics. The largest
> > benefit
> > > is that users won't need to poll for these updates with an admin
> client.
> > > Instead, they will get them as messages.
> > >
> > > I looked to see if an equivalent feature already exists, but I don't
> see
> > > one. For reference, the `PatternMultiTopicsConsumerImpl` currently
> polls
> > > for all topics in a namespace and then does set operations to compute
> the
> > > "new" topics to which it should subscribe. This client implementation
> > could
> > > possibly leverage the new feature.
> > >
> > > There are still details I need to work out, like how it will work for
> > > partitioned vs unpartitioned topics and what kind of guarantees we have
> > > regarding messaging semantics (I think we'll want at least once message
> > > delivery here). I plan to include these details in the PIP with
> > discussions
> > > about trade offs for different implementations.
> > >
> > > Does this feature sound helpful and reasonable to others? If so, is the
> > > next step to formally write a proposal in a Google Doc or to put
> > together a
> > > doc on the Pulsar GitHub Wiki?
> > >
> > > Related and/or future work to consider in this design: I can see adding
> > > different system topics for these types of auditable system events. We
> > > currently rely on log lines as our primary way for end users to audit
> > > system events, e.g. a producer connecting to a broker or a subscription
> > > getting created, but we could instead have topics that represent
> streams
> > of
> > > these different kinds of events. A persistent topic could make these
> > audit
> > > events more durable and more structured which should lend themselves to
> > > being more easily analyzed. Further, users could choose to turn on/off
> > > these audit events, perhaps at the broker or namespace level, to fit
> > their
> > > own needs.
> > >
> > > Let me know what you think and how I should proceed.
> > >
> > > Regards,
> > > Michael Marshall
> >
>

Re: [Discuss] PIP to add system topic for topic creation/deletion events

Posted by Michael Marshall <mi...@gmail.com>.
Thanks for your response, PengHui.

I think this feature would be useful to end users for cluster management,
which is why I want to contribute a first class feature instead of writing
my own plugin that would add little value to the community.

> With the broker interceptor you can intercept all the REST API request
and response, Pulsar commands between the broker and clients.

Based on looking through the interceptor trait, I don't see a way to
trigger events based on auto created/deleted topics. For example, when a
producer connects to a broker for a nonexistent topic (assuming auto topic
creation is allowed), a managed ledger, and thus a topic, is created
without ever interacting with that interceptor trait. The same appears to
be true for garbage collected topics. I think we'll need more than this
interceptor to properly capture all cases where topics are created or
deleted.

Regarding my reference to potential further work, it does appear that low
level auditing of connections and pulsar commands could be covered by the
interceptor. However, it would still be on the end user to implement such
functionality.

Thanks,
Michael


On Wed, Apr 21, 2021 at 3:51 AM PengHui Li <co...@gmail.com> wrote:

> Hi Michael,
>
> Currently, Pulsar supports a pluginable Broker Interceptor, you can find
> it here
> https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-broker/src/main/java/org/apache/pulsar/broker/intercept/BrokerInterceptor.java
>
> With the broker interceptor you can intercept all the REST API request and
> response, Pulsar commands between the broker and clients.
> This can be used to audit the system events.
>
> Thanks,
> Penghui
> On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <mi...@gmail.com>,
> wrote:
> > Hello all,
> >
> > I would like to propose adding a new feature to Pulsar that will require
> a
> > PIP. In addition to feedback on the proposed feature, I am looking for
> > guidance on how to go about creating the PIP. Thanks for any help you can
> > provide.
> >
> > I would like to add an optional system topic where topic creation and
> topic
> > deletion events are published. This feature will make it easier to
> leverage
> > the auto topic creation and inactive topic deletion features by
> providing a
> > way for users to reactively discover changes to topics. The largest
> benefit
> > is that users won't need to poll for these updates with an admin client.
> > Instead, they will get them as messages.
> >
> > I looked to see if an equivalent feature already exists, but I don't see
> > one. For reference, the `PatternMultiTopicsConsumerImpl` currently polls
> > for all topics in a namespace and then does set operations to compute the
> > "new" topics to which it should subscribe. This client implementation
> could
> > possibly leverage the new feature.
> >
> > There are still details I need to work out, like how it will work for
> > partitioned vs unpartitioned topics and what kind of guarantees we have
> > regarding messaging semantics (I think we'll want at least once message
> > delivery here). I plan to include these details in the PIP with
> discussions
> > about trade offs for different implementations.
> >
> > Does this feature sound helpful and reasonable to others? If so, is the
> > next step to formally write a proposal in a Google Doc or to put
> together a
> > doc on the Pulsar GitHub Wiki?
> >
> > Related and/or future work to consider in this design: I can see adding
> > different system topics for these types of auditable system events. We
> > currently rely on log lines as our primary way for end users to audit
> > system events, e.g. a producer connecting to a broker or a subscription
> > getting created, but we could instead have topics that represent streams
> of
> > these different kinds of events. A persistent topic could make these
> audit
> > events more durable and more structured which should lend themselves to
> > being more easily analyzed. Further, users could choose to turn on/off
> > these audit events, perhaps at the broker or namespace level, to fit
> their
> > own needs.
> >
> > Let me know what you think and how I should proceed.
> >
> > Regards,
> > Michael Marshall
>

Re: [Discuss] PIP to add system topic for topic creation/deletion events

Posted by PengHui Li <co...@gmail.com>.
Hi Michael,

Currently, Pulsar supports a pluginable Broker Interceptor, you can find it here https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-broker/src/main/java/org/apache/pulsar/broker/intercept/BrokerInterceptor.java

With the broker interceptor you can intercept all the REST API request and response, Pulsar commands between the broker and clients.
This can be used to audit the system events.

Thanks,
Penghui
On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <mi...@gmail.com>, wrote:
> Hello all,
>
> I would like to propose adding a new feature to Pulsar that will require a
> PIP. In addition to feedback on the proposed feature, I am looking for
> guidance on how to go about creating the PIP. Thanks for any help you can
> provide.
>
> I would like to add an optional system topic where topic creation and topic
> deletion events are published. This feature will make it easier to leverage
> the auto topic creation and inactive topic deletion features by providing a
> way for users to reactively discover changes to topics. The largest benefit
> is that users won't need to poll for these updates with an admin client.
> Instead, they will get them as messages.
>
> I looked to see if an equivalent feature already exists, but I don't see
> one. For reference, the `PatternMultiTopicsConsumerImpl` currently polls
> for all topics in a namespace and then does set operations to compute the
> "new" topics to which it should subscribe. This client implementation could
> possibly leverage the new feature.
>
> There are still details I need to work out, like how it will work for
> partitioned vs unpartitioned topics and what kind of guarantees we have
> regarding messaging semantics (I think we'll want at least once message
> delivery here). I plan to include these details in the PIP with discussions
> about trade offs for different implementations.
>
> Does this feature sound helpful and reasonable to others? If so, is the
> next step to formally write a proposal in a Google Doc or to put together a
> doc on the Pulsar GitHub Wiki?
>
> Related and/or future work to consider in this design: I can see adding
> different system topics for these types of auditable system events. We
> currently rely on log lines as our primary way for end users to audit
> system events, e.g. a producer connecting to a broker or a subscription
> getting created, but we could instead have topics that represent streams of
> these different kinds of events. A persistent topic could make these audit
> events more durable and more structured which should lend themselves to
> being more easily analyzed. Further, users could choose to turn on/off
> these audit events, perhaps at the broker or namespace level, to fit their
> own needs.
>
> Let me know what you think and how I should proceed.
>
> Regards,
> Michael Marshall