You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pulsar.apache.org by zhaocong <zh...@apache.org> on 2022/06/08 03:22:05 UTC

[DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol

Hello Pulsar Community,


Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol.
I look forward to your feedback.


PIP: https://github.com/apache/pulsar/issues/15859


Thanks,

Cong Zhao

Re: [DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol

Posted by Cong Zhao <zh...@apache.org>.

Hi Michael,

Thanks for your feedback.

> What are the use cases where a client needs to know the cluster health
> other than for auto-failover? 
For the time being, I only think of the case of auto-failover.

> The question of cluster health is
> complicated, given that Pulsar is a distributed system, and I do not
> view the broker health check as a reliable proxy for cluster health. I
> view it as an indicator of a single broker's health at a single point
> in time.
This is indeed a problem, but I think using the `HEALTH_CHECK` command to check the health of the cluster for auto-failover cases should be better than just probing the port before.

Thanks,
Cong Zhao

On 2022/07/01 05:47:46 Michael Marshall wrote:
> Thanks for your response Cong Zhao.
> 
> > I think don't need to handle it, because the client API should be consistent with the `HEALTH_CHECK` command result, and users can retry it if they need.
> 
> I think I may be coming at this from a different perspective. A client
> application does not need to know about the health of an individual
> broker, so I don't think we should expose it explicitly through the
> client API. A client's primary concern is whether it can send/receive
> its messages, and a client's notion of cluster health should only be
> based upon whether it can send/receive its messages.
> 
> > Also, for the main problem that this proposal wants to solve (how to check a new cluster is healthy). Do you have another better idea?
> 
> What are the use cases where a client needs to know the cluster health
> other than for auto-failover? The question of cluster health is
> complicated, given that Pulsar is a distributed system, and I do not
> view the broker health check as a reliable proxy for cluster health. I
> view it as an indicator of a single broker's health at a single point
> in time.
> 
> Thanks,
> Michael
> 
> 
> On Fri, Jun 24, 2022 at 4:43 AM Cong Zhao <zh...@apache.org> wrote:
> >
> > Hi Michael,
> >
> > > I think the current PIP might need some clarification on how errors
> > > are handled. For example, if a single broker fails to respond because
> > > it was being restarted, how would the client handle that kind of
> > > failure with this feature?
> >
> > I think don't need to handle it, because the client API should be consistent with the `HEALTH_CHECK` command result, and users can retry it if they need.
> >
> > > I wasn't suggesting that the client would need to ask the broker for
> > > each of the producers/consumers, but rather that the client would
> > > monitor producers/consumers locally and make decisions about cluster
> > > health. For example, if a producer cannot connect to its target topic
> > > after some amount of time or some number of retries, or if a producer
> > > can connect but cannot publish a message successfully within some
> > > amount of time, then the client could consider the cluster to be
> > > unhealthy.
> > >
> > > > This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.
> > >
> > > The client will discover if one of its targeted topics is unavailable,
> > > so instead of monitoring the broker's health check topic, I think the
> > > client should monitor/failover when a targeted topic is "unavailable"
> > > for some configured length of time.
> > >
> > > I support making the auto-failover logic more robust, but I don't
> > > think the broker health check is the right signal to use for overall
> > > cluster health. In my view, the broker's health check is meant to
> > > signal to orchestrators (like Kubernetes) when a broker ought to be
> > > restarted.
> >
> > For the currently connected cluster, we really can't think the current topic is unavailable just because the `HEALTH_CHECK` command result is unhealthy, the current means for auto-failover are relatively rude. I think we improve it by adding extra measures such as mentioned above to you, but this doesn't fall within the scope of the proposal.
> >
> > Also, for the main problem that this proposal wants to solve (how to check a new cluster is healthy). Do you have another better idea?
> >
> > Thanks,
> > Cong Zhao
> >
> > On 2022/06/24 05:25:46 Michael Marshall wrote:
> > > Thanks for your replies Cong Zhao.
> > >
> > > I think the current PIP might need some clarification on how errors
> > > are handled. For example, if a single broker fails to respond because
> > > it was being restarted, how would the client handle that kind of
> > > failure with this feature?
> > >
> > > > This is a good definition of cluster health, but we can't check all topics that would add a lot of load on cleint and broker.
> > >
> > > I wasn't suggesting that the client would need to ask the broker for
> > > each of the producers/consumers, but rather that the client would
> > > monitor producers/consumers locally and make decisions about cluster
> > > health. For example, if a producer cannot connect to its target topic
> > > after some amount of time or some number of retries, or if a producer
> > > can connect but cannot publish a message successfully within some
> > > amount of time, then the client could consider the cluster to be
> > > unhealthy.
> > >
> > > > This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.
> > >
> > > The client will discover if one of its targeted topics is unavailable,
> > > so instead of monitoring the broker's health check topic, I think the
> > > client should monitor/failover when a targeted topic is "unavailable"
> > > for some configured length of time.
> > >
> > > I support making the auto-failover logic more robust, but I don't
> > > think the broker health check is the right signal to use for overall
> > > cluster health. In my view, the broker's health check is meant to
> > > signal to orchestrators (like Kubernetes) when a broker ought to be
> > > restarted.
> > >
> > > Thanks,
> > > Michael
> > >
> > >
> > > On Thu, Jun 23, 2022 at 12:35 AM Cong Zhao <zh...@apache.org> wrote:
> > > >
> > > > Hi Michael,
> > > >
> > > > Thanks for your feedback.
> > > >
> > > > > I define a client's primary cluster as "healthy" when it is "healthy"
> > > > for all of its producers and consumers. I define a healthy producer as
> > > > one that can connect to a topic and publish messages within certain
> > > > latency and throughput thresholds (configured by the user), and I
> > > > define a healthy consumer as one that can connect to a topic and
> > > > consume messages when there are messages to be consumed (possibly
> > > > within a certain latency?).
> > > >
> > > > This is a good definition of cluster health, but we can't check all topics that would add a lot of load on cleint and broker.
> > > >
> > > > > By the above definitions, I don't think the broker's health check will
> > > > give us the right notion of "healthy" because that health check
> > > > monitors producing/consuming to/from the health check topic, not the
> > > > client's target topics. One primary difference is that a health check
> > > > topic could have a different persistence policy, which means the
> > > > client could incorrectly classify the broker as healthy when there
> > > > aren't enough available bookies for a producer's target topic.
> > > >
> > > > This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.
> > > >
> > > > I think if the client implementation doesn't meet the user's needs, they can also override the `healthCheck` method based on the `HEALTH_CHECK` command.
> > > >
> > > > Thanks,
> > > > Cong Zhao
> > > >
> > > > On 2022/06/22 19:06:25 Michael Marshall wrote:
> > > > > I'd like to clarify the motivation for this PIP. My understanding is
> > > > > that the primary motivation is to give clients a robust way to
> > > > > classify a cluster as "healthy". The initial beneficiary of this
> > > > > feature is the auto failover use case. I think the feature makes
> > > > > sense, but before using the broker's concept of "healthy" as defined
> > > > > in the broker health check, I think we should define what constitutes
> > > > > a "healthy cluster" from the client's perspective.
> > > > >
> > > > > I define a client's primary cluster as "healthy" when it is "healthy"
> > > > > for all of its producers and consumers. I define a healthy producer as
> > > > > one that can connect to a topic and publish messages within certain
> > > > > latency and throughput thresholds (configured by the user), and I
> > > > > define a healthy consumer as one that can connect to a topic and
> > > > > consume messages when there are messages to be consumed (possibly
> > > > > within a certain latency?).
> > > > >
> > > > > By the above definitions, I don't think the broker's health check will
> > > > > give us the right notion of "healthy" because that health check
> > > > > monitors producing/consuming to/from the health check topic, not the
> > > > > client's target topics. One primary difference is that a health check
> > > > > topic could have a different persistence policy, which means the
> > > > > client could incorrectly classify the broker as healthy when there
> > > > > aren't enough available bookies for a producer's target topic.
> > > > >
> > > > > The broker health check also includes checks that we probably don't
> > > > > want to use to classify whole clusters as "unhealthy". For example, if
> > > > > the broker is deadlocked, it will be considered unhealthy. In
> > > > > Kubernetes, that broker will be restarted "soon", and the topics will
> > > > > be scheduled to another broker. I probably wouldn't consider a
> > > > > whole cluster as "unhealthy" because a single broker was deadlocked.
> > > > > Instead, I'd consider a cluster unhealthy when latency/throughput are
> > > > > not meeting expectations, which could happen because a broker is
> > > > > deadlocked. Further, there is a chance that the deadlock in the broker
> > > > > didn't affect the client's producers and consumers, which is yet
> > > > > another reason not to failover to another cluster based on a failed
> > > > > broker health check.
> > > > >
> > > > > I look forward to hearing your definitions of client health.
> > > > >
> > > > > Thanks,
> > > > > Michael
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jun 22, 2022 at 8:30 AM Cong Zhao <zh...@apache.org> wrote:
> > > > > >
> > > > > > Yes, there may have multiple clients request the HC at the same time in the AutoFailover case, so we should add some cache to reduce broker load.
> > > > > >
> > > > > > On 2022/06/22 12:55:49 Enrico Olivelli wrote:
> > > > > > > Il giorno mer 22 giu 2022 alle ore 14:45 Cong Zhao
> > > > > > > <zh...@apache.org> ha scritto:
> > > > > > > >
> > > > > > > > Hi Enrico,
> > > > > > > >
> > > > > > > > > Also, I would like to understand in which usecase you can use the
> > > > > > > > > binary endpoint and not the HTTP endpoint.
> > > > > > > >
> > > > > > > > We can't use the HTTP endpoint when the client did not have the admin auth to do a health check. but we need it in some cases such as auto failover on the client-side (https://github.com/apache/pulsar/pull/13316#discussion_r773313991)
> > > > > > > AutoFailover is a valid use case for me.
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > > Health Check is good for scripts and for probes, I don't expect a
> > > > > > > > > "client application" to run the HC
> > > > > > > >
> > > > > > > > Adding a health check API to the client-side just to make it easier to use this feature, this check still works on broker.
> > > > > > >
> > > > > > > makes sense, but usually the HC, like in k8s or in other environments
> > > > > > > is run every X seconds and usually not concurrently
> > > > > > >
> > > > > > > if you have multiple (tens? hundreds?) of Pulsar clients that require
> > > > > > > the HC, this will be a big problem,
> > > > > > > is this the reason why you want to add some cache to the response of the HC ?
> > > > > > >
> > > > > > > Enrico
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 2022/06/22 10:19:52 Enrico Olivelli wrote:
> > > > > > > > > I believe that this proposal is too broad.
> > > > > > > > >
> > > > > > > > > the PIP reads about:
> > > > > > > > > - adding HEALTHCHECK to the binary protocol
> > > > > > > > > - add a HEALTHCHECK cache on the broker
> > > > > > > > >
> > > > > > > > > Also, I would like to understand in which usecase you can use the
> > > > > > > > > binary endpoint and not the HTTP endpoint.
> > > > > > > > >
> > > > > > > > > Health Check is good for scripts and for probes, I don't expect a
> > > > > > > > > "client application" to run the HC
> > > > > > > > >
> > > > > > > > > Can you please illustrate some practical use cases?
> > > > > > > > >
> > > > > > > > > Enric
> > > > > > > > >
> > > > > > > > > Il giorno mer 8 giu 2022 alle ore 05:22 zhaocong <zh...@apache.org>
> > > > > > > > > ha scritto:
> > > > > > > > > >
> > > > > > > > > > Hello Pulsar Community,
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol.
> > > > > > > > > > I look forward to your feedback.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > PIP: https://github.com/apache/pulsar/issues/15859
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Cong Zhao
> > > > > > > > >
> > > > > > >
> > > > >
> > >
>

Re: [DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol

Posted by Michael Marshall <mm...@apache.org>.

Thanks for your response Cong Zhao.

> I think don't need to handle it, because the client API should be consistent with the `HEALTH_CHECK` command result, and users can retry it if they need.

I think I may be coming at this from a different perspective. A client
application does not need to know about the health of an individual
broker, so I don't think we should expose it explicitly through the
client API. A client's primary concern is whether it can send/receive
its messages, and a client's notion of cluster health should only be
based upon whether it can send/receive its messages.

> Also, for the main problem that this proposal wants to solve (how to check a new cluster is healthy). Do you have another better idea?

What are the use cases where a client needs to know the cluster health
other than for auto-failover? The question of cluster health is
complicated, given that Pulsar is a distributed system, and I do not
view the broker health check as a reliable proxy for cluster health. I
view it as an indicator of a single broker's health at a single point
in time.

Thanks,
Michael


On Fri, Jun 24, 2022 at 4:43 AM Cong Zhao <zh...@apache.org> wrote:
>
> Hi Michael,
>
> > I think the current PIP might need some clarification on how errors
> > are handled. For example, if a single broker fails to respond because
> > it was being restarted, how would the client handle that kind of
> > failure with this feature?
>
> I think don't need to handle it, because the client API should be consistent with the `HEALTH_CHECK` command result, and users can retry it if they need.
>
> > I wasn't suggesting that the client would need to ask the broker for
> > each of the producers/consumers, but rather that the client would
> > monitor producers/consumers locally and make decisions about cluster
> > health. For example, if a producer cannot connect to its target topic
> > after some amount of time or some number of retries, or if a producer
> > can connect but cannot publish a message successfully within some
> > amount of time, then the client could consider the cluster to be
> > unhealthy.
> >
> > > This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.
> >
> > The client will discover if one of its targeted topics is unavailable,
> > so instead of monitoring the broker's health check topic, I think the
> > client should monitor/failover when a targeted topic is "unavailable"
> > for some configured length of time.
> >
> > I support making the auto-failover logic more robust, but I don't
> > think the broker health check is the right signal to use for overall
> > cluster health. In my view, the broker's health check is meant to
> > signal to orchestrators (like Kubernetes) when a broker ought to be
> > restarted.
>
> For the currently connected cluster, we really can't think the current topic is unavailable just because the `HEALTH_CHECK` command result is unhealthy, the current means for auto-failover are relatively rude. I think we improve it by adding extra measures such as mentioned above to you, but this doesn't fall within the scope of the proposal.
>
> Also, for the main problem that this proposal wants to solve (how to check a new cluster is healthy). Do you have another better idea?
>
> Thanks,
> Cong Zhao
>
> On 2022/06/24 05:25:46 Michael Marshall wrote:
> > Thanks for your replies Cong Zhao.
> >
> > I think the current PIP might need some clarification on how errors
> > are handled. For example, if a single broker fails to respond because
> > it was being restarted, how would the client handle that kind of
> > failure with this feature?
> >
> > > This is a good definition of cluster health, but we can't check all topics that would add a lot of load on cleint and broker.
> >
> > I wasn't suggesting that the client would need to ask the broker for
> > each of the producers/consumers, but rather that the client would
> > monitor producers/consumers locally and make decisions about cluster
> > health. For example, if a producer cannot connect to its target topic
> > after some amount of time or some number of retries, or if a producer
> > can connect but cannot publish a message successfully within some
> > amount of time, then the client could consider the cluster to be
> > unhealthy.
> >
> > > This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.
> >
> > The client will discover if one of its targeted topics is unavailable,
> > so instead of monitoring the broker's health check topic, I think the
> > client should monitor/failover when a targeted topic is "unavailable"
> > for some configured length of time.
> >
> > I support making the auto-failover logic more robust, but I don't
> > think the broker health check is the right signal to use for overall
> > cluster health. In my view, the broker's health check is meant to
> > signal to orchestrators (like Kubernetes) when a broker ought to be
> > restarted.
> >
> > Thanks,
> > Michael
> >
> >
> > On Thu, Jun 23, 2022 at 12:35 AM Cong Zhao <zh...@apache.org> wrote:
> > >
> > > Hi Michael,
> > >
> > > Thanks for your feedback.
> > >
> > > > I define a client's primary cluster as "healthy" when it is "healthy"
> > > for all of its producers and consumers. I define a healthy producer as
> > > one that can connect to a topic and publish messages within certain
> > > latency and throughput thresholds (configured by the user), and I
> > > define a healthy consumer as one that can connect to a topic and
> > > consume messages when there are messages to be consumed (possibly
> > > within a certain latency?).
> > >
> > > This is a good definition of cluster health, but we can't check all topics that would add a lot of load on cleint and broker.
> > >
> > > > By the above definitions, I don't think the broker's health check will
> > > give us the right notion of "healthy" because that health check
> > > monitors producing/consuming to/from the health check topic, not the
> > > client's target topics. One primary difference is that a health check
> > > topic could have a different persistence policy, which means the
> > > client could incorrectly classify the broker as healthy when there
> > > aren't enough available bookies for a producer's target topic.
> > >
> > > This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.
> > >
> > > I think if the client implementation doesn't meet the user's needs, they can also override the `healthCheck` method based on the `HEALTH_CHECK` command.
> > >
> > > Thanks,
> > > Cong Zhao
> > >
> > > On 2022/06/22 19:06:25 Michael Marshall wrote:
> > > > I'd like to clarify the motivation for this PIP. My understanding is
> > > > that the primary motivation is to give clients a robust way to
> > > > classify a cluster as "healthy". The initial beneficiary of this
> > > > feature is the auto failover use case. I think the feature makes
> > > > sense, but before using the broker's concept of "healthy" as defined
> > > > in the broker health check, I think we should define what constitutes
> > > > a "healthy cluster" from the client's perspective.
> > > >
> > > > I define a client's primary cluster as "healthy" when it is "healthy"
> > > > for all of its producers and consumers. I define a healthy producer as
> > > > one that can connect to a topic and publish messages within certain
> > > > latency and throughput thresholds (configured by the user), and I
> > > > define a healthy consumer as one that can connect to a topic and
> > > > consume messages when there are messages to be consumed (possibly
> > > > within a certain latency?).
> > > >
> > > > By the above definitions, I don't think the broker's health check will
> > > > give us the right notion of "healthy" because that health check
> > > > monitors producing/consuming to/from the health check topic, not the
> > > > client's target topics. One primary difference is that a health check
> > > > topic could have a different persistence policy, which means the
> > > > client could incorrectly classify the broker as healthy when there
> > > > aren't enough available bookies for a producer's target topic.
> > > >
> > > > The broker health check also includes checks that we probably don't
> > > > want to use to classify whole clusters as "unhealthy". For example, if
> > > > the broker is deadlocked, it will be considered unhealthy. In
> > > > Kubernetes, that broker will be restarted "soon", and the topics will
> > > > be scheduled to another broker. I probably wouldn't consider a
> > > > whole cluster as "unhealthy" because a single broker was deadlocked.
> > > > Instead, I'd consider a cluster unhealthy when latency/throughput are
> > > > not meeting expectations, which could happen because a broker is
> > > > deadlocked. Further, there is a chance that the deadlock in the broker
> > > > didn't affect the client's producers and consumers, which is yet
> > > > another reason not to failover to another cluster based on a failed
> > > > broker health check.
> > > >
> > > > I look forward to hearing your definitions of client health.
> > > >
> > > > Thanks,
> > > > Michael
> > > >
> > > >
> > > >
> > > > On Wed, Jun 22, 2022 at 8:30 AM Cong Zhao <zh...@apache.org> wrote:
> > > > >
> > > > > Yes, there may have multiple clients request the HC at the same time in the AutoFailover case, so we should add some cache to reduce broker load.
> > > > >
> > > > > On 2022/06/22 12:55:49 Enrico Olivelli wrote:
> > > > > > Il giorno mer 22 giu 2022 alle ore 14:45 Cong Zhao
> > > > > > <zh...@apache.org> ha scritto:
> > > > > > >
> > > > > > > Hi Enrico,
> > > > > > >
> > > > > > > > Also, I would like to understand in which usecase you can use the
> > > > > > > > binary endpoint and not the HTTP endpoint.
> > > > > > >
> > > > > > > We can't use the HTTP endpoint when the client did not have the admin auth to do a health check. but we need it in some cases such as auto failover on the client-side (https://github.com/apache/pulsar/pull/13316#discussion_r773313991)
> > > > > > AutoFailover is a valid use case for me.
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > > Health Check is good for scripts and for probes, I don't expect a
> > > > > > > > "client application" to run the HC
> > > > > > >
> > > > > > > Adding a health check API to the client-side just to make it easier to use this feature, this check still works on broker.
> > > > > >
> > > > > > makes sense, but usually the HC, like in k8s or in other environments
> > > > > > is run every X seconds and usually not concurrently
> > > > > >
> > > > > > if you have multiple (tens? hundreds?) of Pulsar clients that require
> > > > > > the HC, this will be a big problem,
> > > > > > is this the reason why you want to add some cache to the response of the HC ?
> > > > > >
> > > > > > Enrico
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 2022/06/22 10:19:52 Enrico Olivelli wrote:
> > > > > > > > I believe that this proposal is too broad.
> > > > > > > >
> > > > > > > > the PIP reads about:
> > > > > > > > - adding HEALTHCHECK to the binary protocol
> > > > > > > > - add a HEALTHCHECK cache on the broker
> > > > > > > >
> > > > > > > > Also, I would like to understand in which usecase you can use the
> > > > > > > > binary endpoint and not the HTTP endpoint.
> > > > > > > >
> > > > > > > > Health Check is good for scripts and for probes, I don't expect a
> > > > > > > > "client application" to run the HC
> > > > > > > >
> > > > > > > > Can you please illustrate some practical use cases?
> > > > > > > >
> > > > > > > > Enric
> > > > > > > >
> > > > > > > > Il giorno mer 8 giu 2022 alle ore 05:22 zhaocong <zh...@apache.org>
> > > > > > > > ha scritto:
> > > > > > > > >
> > > > > > > > > Hello Pulsar Community,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol.
> > > > > > > > > I look forward to your feedback.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > PIP: https://github.com/apache/pulsar/issues/15859
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Cong Zhao
> > > > > > > >
> > > > > >
> > > >
> >

Re: [DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol

Posted by Cong Zhao <zh...@apache.org>.

Hi Michael,

> I think the current PIP might need some clarification on how errors
> are handled. For example, if a single broker fails to respond because
> it was being restarted, how would the client handle that kind of
> failure with this feature?

I think don't need to handle it, because the client API should be consistent with the `HEALTH_CHECK` command result, and users can retry it if they need.

> I wasn't suggesting that the client would need to ask the broker for
> each of the producers/consumers, but rather that the client would
> monitor producers/consumers locally and make decisions about cluster
> health. For example, if a producer cannot connect to its target topic
> after some amount of time or some number of retries, or if a producer
> can connect but cannot publish a message successfully within some
> amount of time, then the client could consider the cluster to be
> unhealthy.
> 
> > This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.
> 
> The client will discover if one of its targeted topics is unavailable,
> so instead of monitoring the broker's health check topic, I think the
> client should monitor/failover when a targeted topic is "unavailable"
> for some configured length of time.
> 
> I support making the auto-failover logic more robust, but I don't
> think the broker health check is the right signal to use for overall
> cluster health. In my view, the broker's health check is meant to
> signal to orchestrators (like Kubernetes) when a broker ought to be
> restarted.

For the currently connected cluster, we really can't think the current topic is unavailable just because the `HEALTH_CHECK` command result is unhealthy, the current means for auto-failover are relatively rude. I think we improve it by adding extra measures such as mentioned above to you, but this doesn't fall within the scope of the proposal.  

Also, for the main problem that this proposal wants to solve (how to check a new cluster is healthy). Do you have another better idea?

Thanks,
Cong Zhao

On 2022/06/24 05:25:46 Michael Marshall wrote:
> Thanks for your replies Cong Zhao.
> 
> I think the current PIP might need some clarification on how errors
> are handled. For example, if a single broker fails to respond because
> it was being restarted, how would the client handle that kind of
> failure with this feature?
> 
> > This is a good definition of cluster health, but we can't check all topics that would add a lot of load on cleint and broker.
> 
> I wasn't suggesting that the client would need to ask the broker for
> each of the producers/consumers, but rather that the client would
> monitor producers/consumers locally and make decisions about cluster
> health. For example, if a producer cannot connect to its target topic
> after some amount of time or some number of retries, or if a producer
> can connect but cannot publish a message successfully within some
> amount of time, then the client could consider the cluster to be
> unhealthy.
> 
> > This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.
> 
> The client will discover if one of its targeted topics is unavailable,
> so instead of monitoring the broker's health check topic, I think the
> client should monitor/failover when a targeted topic is "unavailable"
> for some configured length of time.
> 
> I support making the auto-failover logic more robust, but I don't
> think the broker health check is the right signal to use for overall
> cluster health. In my view, the broker's health check is meant to
> signal to orchestrators (like Kubernetes) when a broker ought to be
> restarted.
> 
> Thanks,
> Michael
> 
> 
> On Thu, Jun 23, 2022 at 12:35 AM Cong Zhao <zh...@apache.org> wrote:
> >
> > Hi Michael,
> >
> > Thanks for your feedback.
> >
> > > I define a client's primary cluster as "healthy" when it is "healthy"
> > for all of its producers and consumers. I define a healthy producer as
> > one that can connect to a topic and publish messages within certain
> > latency and throughput thresholds (configured by the user), and I
> > define a healthy consumer as one that can connect to a topic and
> > consume messages when there are messages to be consumed (possibly
> > within a certain latency?).
> >
> > This is a good definition of cluster health, but we can't check all topics that would add a lot of load on cleint and broker.
> >
> > > By the above definitions, I don't think the broker's health check will
> > give us the right notion of "healthy" because that health check
> > monitors producing/consuming to/from the health check topic, not the
> > client's target topics. One primary difference is that a health check
> > topic could have a different persistence policy, which means the
> > client could incorrectly classify the broker as healthy when there
> > aren't enough available bookies for a producer's target topic.
> >
> > This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.
> >
> > I think if the client implementation doesn't meet the user's needs, they can also override the `healthCheck` method based on the `HEALTH_CHECK` command.
> >
> > Thanks,
> > Cong Zhao
> >
> > On 2022/06/22 19:06:25 Michael Marshall wrote:
> > > I'd like to clarify the motivation for this PIP. My understanding is
> > > that the primary motivation is to give clients a robust way to
> > > classify a cluster as "healthy". The initial beneficiary of this
> > > feature is the auto failover use case. I think the feature makes
> > > sense, but before using the broker's concept of "healthy" as defined
> > > in the broker health check, I think we should define what constitutes
> > > a "healthy cluster" from the client's perspective.
> > >
> > > I define a client's primary cluster as "healthy" when it is "healthy"
> > > for all of its producers and consumers. I define a healthy producer as
> > > one that can connect to a topic and publish messages within certain
> > > latency and throughput thresholds (configured by the user), and I
> > > define a healthy consumer as one that can connect to a topic and
> > > consume messages when there are messages to be consumed (possibly
> > > within a certain latency?).
> > >
> > > By the above definitions, I don't think the broker's health check will
> > > give us the right notion of "healthy" because that health check
> > > monitors producing/consuming to/from the health check topic, not the
> > > client's target topics. One primary difference is that a health check
> > > topic could have a different persistence policy, which means the
> > > client could incorrectly classify the broker as healthy when there
> > > aren't enough available bookies for a producer's target topic.
> > >
> > > The broker health check also includes checks that we probably don't
> > > want to use to classify whole clusters as "unhealthy". For example, if
> > > the broker is deadlocked, it will be considered unhealthy. In
> > > Kubernetes, that broker will be restarted "soon", and the topics will
> > > be scheduled to another broker. I probably wouldn't consider a
> > > whole cluster as "unhealthy" because a single broker was deadlocked.
> > > Instead, I'd consider a cluster unhealthy when latency/throughput are
> > > not meeting expectations, which could happen because a broker is
> > > deadlocked. Further, there is a chance that the deadlock in the broker
> > > didn't affect the client's producers and consumers, which is yet
> > > another reason not to failover to another cluster based on a failed
> > > broker health check.
> > >
> > > I look forward to hearing your definitions of client health.
> > >
> > > Thanks,
> > > Michael
> > >
> > >
> > >
> > > On Wed, Jun 22, 2022 at 8:30 AM Cong Zhao <zh...@apache.org> wrote:
> > > >
> > > > Yes, there may have multiple clients request the HC at the same time in the AutoFailover case, so we should add some cache to reduce broker load.
> > > >
> > > > On 2022/06/22 12:55:49 Enrico Olivelli wrote:
> > > > > Il giorno mer 22 giu 2022 alle ore 14:45 Cong Zhao
> > > > > <zh...@apache.org> ha scritto:
> > > > > >
> > > > > > Hi Enrico,
> > > > > >
> > > > > > > Also, I would like to understand in which usecase you can use the
> > > > > > > binary endpoint and not the HTTP endpoint.
> > > > > >
> > > > > > We can't use the HTTP endpoint when the client did not have the admin auth to do a health check. but we need it in some cases such as auto failover on the client-side (https://github.com/apache/pulsar/pull/13316#discussion_r773313991)
> > > > > AutoFailover is a valid use case for me.
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > > Health Check is good for scripts and for probes, I don't expect a
> > > > > > > "client application" to run the HC
> > > > > >
> > > > > > Adding a health check API to the client-side just to make it easier to use this feature, this check still works on broker.
> > > > >
> > > > > makes sense, but usually the HC, like in k8s or in other environments
> > > > > is run every X seconds and usually not concurrently
> > > > >
> > > > > if you have multiple (tens? hundreds?) of Pulsar clients that require
> > > > > the HC, this will be a big problem,
> > > > > is this the reason why you want to add some cache to the response of the HC ?
> > > > >
> > > > > Enrico
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > On 2022/06/22 10:19:52 Enrico Olivelli wrote:
> > > > > > > I believe that this proposal is too broad.
> > > > > > >
> > > > > > > the PIP reads about:
> > > > > > > - adding HEALTHCHECK to the binary protocol
> > > > > > > - add a HEALTHCHECK cache on the broker
> > > > > > >
> > > > > > > Also, I would like to understand in which usecase you can use the
> > > > > > > binary endpoint and not the HTTP endpoint.
> > > > > > >
> > > > > > > Health Check is good for scripts and for probes, I don't expect a
> > > > > > > "client application" to run the HC
> > > > > > >
> > > > > > > Can you please illustrate some practical use cases?
> > > > > > >
> > > > > > > Enric
> > > > > > >
> > > > > > > Il giorno mer 8 giu 2022 alle ore 05:22 zhaocong <zh...@apache.org>
> > > > > > > ha scritto:
> > > > > > > >
> > > > > > > > Hello Pulsar Community,
> > > > > > > >
> > > > > > > >
> > > > > > > > Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol.
> > > > > > > > I look forward to your feedback.
> > > > > > > >
> > > > > > > >
> > > > > > > > PIP: https://github.com/apache/pulsar/issues/15859
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Cong Zhao
> > > > > > >
> > > > >
> > >
>

Re: [DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol

Posted by Michael Marshall <mm...@apache.org>.

Thanks for your replies Cong Zhao.

I think the current PIP might need some clarification on how errors
are handled. For example, if a single broker fails to respond because
it was being restarted, how would the client handle that kind of
failure with this feature?

> This is a good definition of cluster health, but we can't check all topics that would add a lot of load on cleint and broker.

I wasn't suggesting that the client would need to ask the broker for
each of the producers/consumers, but rather that the client would
monitor producers/consumers locally and make decisions about cluster
health. For example, if a producer cannot connect to its target topic
after some amount of time or some number of retries, or if a producer
can connect but cannot publish a message successfully within some
amount of time, then the client could consider the cluster to be
unhealthy.

> This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.

The client will discover if one of its targeted topics is unavailable,
so instead of monitoring the broker's health check topic, I think the
client should monitor/failover when a targeted topic is "unavailable"
for some configured length of time.

I support making the auto-failover logic more robust, but I don't
think the broker health check is the right signal to use for overall
cluster health. In my view, the broker's health check is meant to
signal to orchestrators (like Kubernetes) when a broker ought to be
restarted.

Thanks,
Michael


On Thu, Jun 23, 2022 at 12:35 AM Cong Zhao <zh...@apache.org> wrote:
>
> Hi Michael,
>
> Thanks for your feedback.
>
> > I define a client's primary cluster as "healthy" when it is "healthy"
> for all of its producers and consumers. I define a healthy producer as
> one that can connect to a topic and publish messages within certain
> latency and throughput thresholds (configured by the user), and I
> define a healthy consumer as one that can connect to a topic and
> consume messages when there are messages to be consumed (possibly
> within a certain latency?).
>
> This is a good definition of cluster health, but we can't check all topics that would add a lot of load on cleint and broker.
>
> > By the above definitions, I don't think the broker's health check will
> give us the right notion of "healthy" because that health check
> monitors producing/consuming to/from the health check topic, not the
> client's target topics. One primary difference is that a health check
> topic could have a different persistence policy, which means the
> client could incorrectly classify the broker as healthy when there
> aren't enough available bookies for a producer's target topic.
>
> This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.
>
> I think if the client implementation doesn't meet the user's needs, they can also override the `healthCheck` method based on the `HEALTH_CHECK` command.
>
> Thanks,
> Cong Zhao
>
> On 2022/06/22 19:06:25 Michael Marshall wrote:
> > I'd like to clarify the motivation for this PIP. My understanding is
> > that the primary motivation is to give clients a robust way to
> > classify a cluster as "healthy". The initial beneficiary of this
> > feature is the auto failover use case. I think the feature makes
> > sense, but before using the broker's concept of "healthy" as defined
> > in the broker health check, I think we should define what constitutes
> > a "healthy cluster" from the client's perspective.
> >
> > I define a client's primary cluster as "healthy" when it is "healthy"
> > for all of its producers and consumers. I define a healthy producer as
> > one that can connect to a topic and publish messages within certain
> > latency and throughput thresholds (configured by the user), and I
> > define a healthy consumer as one that can connect to a topic and
> > consume messages when there are messages to be consumed (possibly
> > within a certain latency?).
> >
> > By the above definitions, I don't think the broker's health check will
> > give us the right notion of "healthy" because that health check
> > monitors producing/consuming to/from the health check topic, not the
> > client's target topics. One primary difference is that a health check
> > topic could have a different persistence policy, which means the
> > client could incorrectly classify the broker as healthy when there
> > aren't enough available bookies for a producer's target topic.
> >
> > The broker health check also includes checks that we probably don't
> > want to use to classify whole clusters as "unhealthy". For example, if
> > the broker is deadlocked, it will be considered unhealthy. In
> > Kubernetes, that broker will be restarted "soon", and the topics will
> > be scheduled to another broker. I probably wouldn't consider a
> > whole cluster as "unhealthy" because a single broker was deadlocked.
> > Instead, I'd consider a cluster unhealthy when latency/throughput are
> > not meeting expectations, which could happen because a broker is
> > deadlocked. Further, there is a chance that the deadlock in the broker
> > didn't affect the client's producers and consumers, which is yet
> > another reason not to failover to another cluster based on a failed
> > broker health check.
> >
> > I look forward to hearing your definitions of client health.
> >
> > Thanks,
> > Michael
> >
> >
> >
> > On Wed, Jun 22, 2022 at 8:30 AM Cong Zhao <zh...@apache.org> wrote:
> > >
> > > Yes, there may have multiple clients request the HC at the same time in the AutoFailover case, so we should add some cache to reduce broker load.
> > >
> > > On 2022/06/22 12:55:49 Enrico Olivelli wrote:
> > > > Il giorno mer 22 giu 2022 alle ore 14:45 Cong Zhao
> > > > <zh...@apache.org> ha scritto:
> > > > >
> > > > > Hi Enrico,
> > > > >
> > > > > > Also, I would like to understand in which usecase you can use the
> > > > > > binary endpoint and not the HTTP endpoint.
> > > > >
> > > > > We can't use the HTTP endpoint when the client did not have the admin auth to do a health check. but we need it in some cases such as auto failover on the client-side (https://github.com/apache/pulsar/pull/13316#discussion_r773313991)
> > > > AutoFailover is a valid use case for me.
> > > > Thanks
> > > >
> > > > >
> > > > > > Health Check is good for scripts and for probes, I don't expect a
> > > > > > "client application" to run the HC
> > > > >
> > > > > Adding a health check API to the client-side just to make it easier to use this feature, this check still works on broker.
> > > >
> > > > makes sense, but usually the HC, like in k8s or in other environments
> > > > is run every X seconds and usually not concurrently
> > > >
> > > > if you have multiple (tens? hundreds?) of Pulsar clients that require
> > > > the HC, this will be a big problem,
> > > > is this the reason why you want to add some cache to the response of the HC ?
> > > >
> > > > Enrico
> > > >
> > > >
> > > > >
> > > > >
> > > > > On 2022/06/22 10:19:52 Enrico Olivelli wrote:
> > > > > > I believe that this proposal is too broad.
> > > > > >
> > > > > > the PIP reads about:
> > > > > > - adding HEALTHCHECK to the binary protocol
> > > > > > - add a HEALTHCHECK cache on the broker
> > > > > >
> > > > > > Also, I would like to understand in which usecase you can use the
> > > > > > binary endpoint and not the HTTP endpoint.
> > > > > >
> > > > > > Health Check is good for scripts and for probes, I don't expect a
> > > > > > "client application" to run the HC
> > > > > >
> > > > > > Can you please illustrate some practical use cases?
> > > > > >
> > > > > > Enric
> > > > > >
> > > > > > Il giorno mer 8 giu 2022 alle ore 05:22 zhaocong <zh...@apache.org>
> > > > > > ha scritto:
> > > > > > >
> > > > > > > Hello Pulsar Community,
> > > > > > >
> > > > > > >
> > > > > > > Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol.
> > > > > > > I look forward to your feedback.
> > > > > > >
> > > > > > >
> > > > > > > PIP: https://github.com/apache/pulsar/issues/15859
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Cong Zhao
> > > > > >
> > > >
> >

Re: [DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol

Posted by Cong Zhao <zh...@apache.org>.

Hi Michael,

Thanks for your feedback.

> I define a client's primary cluster as "healthy" when it is "healthy"
for all of its producers and consumers. I define a healthy producer as
one that can connect to a topic and publish messages within certain
latency and throughput thresholds (configured by the user), and I
define a healthy consumer as one that can connect to a topic and
consume messages when there are messages to be consumed (possibly
within a certain latency?).

This is a good definition of cluster health, but we can't check all topics that would add a lot of load on cleint and broker.

> By the above definitions, I don't think the broker's health check will
give us the right notion of "healthy" because that health check
monitors producing/consuming to/from the health check topic, not the
client's target topics. One primary difference is that a health check
topic could have a different persistence policy, which means the
client could incorrectly classify the broker as healthy when there
aren't enough available bookies for a producer's target topic.

This proposal mainly provides a means to check whether there is available topic in the cluster, and I think this is meaningful in most cases.

I think if the client implementation doesn't meet the user's needs, they can also override the `healthCheck` method based on the `HEALTH_CHECK` command.

Thanks,
Cong Zhao

On 2022/06/22 19:06:25 Michael Marshall wrote:
> I'd like to clarify the motivation for this PIP. My understanding is
> that the primary motivation is to give clients a robust way to
> classify a cluster as "healthy". The initial beneficiary of this
> feature is the auto failover use case. I think the feature makes
> sense, but before using the broker's concept of "healthy" as defined
> in the broker health check, I think we should define what constitutes
> a "healthy cluster" from the client's perspective.
> 
> I define a client's primary cluster as "healthy" when it is "healthy"
> for all of its producers and consumers. I define a healthy producer as
> one that can connect to a topic and publish messages within certain
> latency and throughput thresholds (configured by the user), and I
> define a healthy consumer as one that can connect to a topic and
> consume messages when there are messages to be consumed (possibly
> within a certain latency?).
> 
> By the above definitions, I don't think the broker's health check will
> give us the right notion of "healthy" because that health check
> monitors producing/consuming to/from the health check topic, not the
> client's target topics. One primary difference is that a health check
> topic could have a different persistence policy, which means the
> client could incorrectly classify the broker as healthy when there
> aren't enough available bookies for a producer's target topic.
> 
> The broker health check also includes checks that we probably don't
> want to use to classify whole clusters as "unhealthy". For example, if
> the broker is deadlocked, it will be considered unhealthy. In
> Kubernetes, that broker will be restarted "soon", and the topics will
> be scheduled to another broker. I probably wouldn't consider a
> whole cluster as "unhealthy" because a single broker was deadlocked.
> Instead, I'd consider a cluster unhealthy when latency/throughput are
> not meeting expectations, which could happen because a broker is
> deadlocked. Further, there is a chance that the deadlock in the broker
> didn't affect the client's producers and consumers, which is yet
> another reason not to failover to another cluster based on a failed
> broker health check.
> 
> I look forward to hearing your definitions of client health.
> 
> Thanks,
> Michael
> 
> 
> 
> On Wed, Jun 22, 2022 at 8:30 AM Cong Zhao <zh...@apache.org> wrote:
> >
> > Yes, there may have multiple clients request the HC at the same time in the AutoFailover case, so we should add some cache to reduce broker load.
> >
> > On 2022/06/22 12:55:49 Enrico Olivelli wrote:
> > > Il giorno mer 22 giu 2022 alle ore 14:45 Cong Zhao
> > > <zh...@apache.org> ha scritto:
> > > >
> > > > Hi Enrico,
> > > >
> > > > > Also, I would like to understand in which usecase you can use the
> > > > > binary endpoint and not the HTTP endpoint.
> > > >
> > > > We can't use the HTTP endpoint when the client did not have the admin auth to do a health check. but we need it in some cases such as auto failover on the client-side (https://github.com/apache/pulsar/pull/13316#discussion_r773313991)
> > > AutoFailover is a valid use case for me.
> > > Thanks
> > >
> > > >
> > > > > Health Check is good for scripts and for probes, I don't expect a
> > > > > "client application" to run the HC
> > > >
> > > > Adding a health check API to the client-side just to make it easier to use this feature, this check still works on broker.
> > >
> > > makes sense, but usually the HC, like in k8s or in other environments
> > > is run every X seconds and usually not concurrently
> > >
> > > if you have multiple (tens? hundreds?) of Pulsar clients that require
> > > the HC, this will be a big problem,
> > > is this the reason why you want to add some cache to the response of the HC ?
> > >
> > > Enrico
> > >
> > >
> > > >
> > > >
> > > > On 2022/06/22 10:19:52 Enrico Olivelli wrote:
> > > > > I believe that this proposal is too broad.
> > > > >
> > > > > the PIP reads about:
> > > > > - adding HEALTHCHECK to the binary protocol
> > > > > - add a HEALTHCHECK cache on the broker
> > > > >
> > > > > Also, I would like to understand in which usecase you can use the
> > > > > binary endpoint and not the HTTP endpoint.
> > > > >
> > > > > Health Check is good for scripts and for probes, I don't expect a
> > > > > "client application" to run the HC
> > > > >
> > > > > Can you please illustrate some practical use cases?
> > > > >
> > > > > Enric
> > > > >
> > > > > Il giorno mer 8 giu 2022 alle ore 05:22 zhaocong <zh...@apache.org>
> > > > > ha scritto:
> > > > > >
> > > > > > Hello Pulsar Community,
> > > > > >
> > > > > >
> > > > > > Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol.
> > > > > > I look forward to your feedback.
> > > > > >
> > > > > >
> > > > > > PIP: https://github.com/apache/pulsar/issues/15859
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Cong Zhao
> > > > >
> > >
>

Re: [DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol

Posted by Michael Marshall <mm...@apache.org>.

I'd like to clarify the motivation for this PIP. My understanding is
that the primary motivation is to give clients a robust way to
classify a cluster as "healthy". The initial beneficiary of this
feature is the auto failover use case. I think the feature makes
sense, but before using the broker's concept of "healthy" as defined
in the broker health check, I think we should define what constitutes
a "healthy cluster" from the client's perspective.

I define a client's primary cluster as "healthy" when it is "healthy"
for all of its producers and consumers. I define a healthy producer as
one that can connect to a topic and publish messages within certain
latency and throughput thresholds (configured by the user), and I
define a healthy consumer as one that can connect to a topic and
consume messages when there are messages to be consumed (possibly
within a certain latency?).

By the above definitions, I don't think the broker's health check will
give us the right notion of "healthy" because that health check
monitors producing/consuming to/from the health check topic, not the
client's target topics. One primary difference is that a health check
topic could have a different persistence policy, which means the
client could incorrectly classify the broker as healthy when there
aren't enough available bookies for a producer's target topic.

The broker health check also includes checks that we probably don't
want to use to classify whole clusters as "unhealthy". For example, if
the broker is deadlocked, it will be considered unhealthy. In
Kubernetes, that broker will be restarted "soon", and the topics will
be scheduled to another broker. I probably wouldn't consider a
whole cluster as "unhealthy" because a single broker was deadlocked.
Instead, I'd consider a cluster unhealthy when latency/throughput are
not meeting expectations, which could happen because a broker is
deadlocked. Further, there is a chance that the deadlock in the broker
didn't affect the client's producers and consumers, which is yet
another reason not to failover to another cluster based on a failed
broker health check.

I look forward to hearing your definitions of client health.

Thanks,
Michael

On Wed, Jun 22, 2022 at 8:30 AM Cong Zhao <zh...@apache.org> wrote:
>
> Yes, there may have multiple clients request the HC at the same time in the AutoFailover case, so we should add some cache to reduce broker load.
>
> On 2022/06/22 12:55:49 Enrico Olivelli wrote:
> > Il giorno mer 22 giu 2022 alle ore 14:45 Cong Zhao
> > <zh...@apache.org> ha scritto:
> > >
> > > Hi Enrico,
> > >
> > > > Also, I would like to understand in which usecase you can use the
> > > > binary endpoint and not the HTTP endpoint.
> > >
> > > We can't use the HTTP endpoint when the client did not have the admin auth to do a health check. but we need it in some cases such as auto failover on the client-side (https://github.com/apache/pulsar/pull/13316#discussion_r773313991)
> > AutoFailover is a valid use case for me.
> > Thanks
> >
> > >
> > > > Health Check is good for scripts and for probes, I don't expect a
> > > > "client application" to run the HC
> > >
> > > Adding a health check API to the client-side just to make it easier to use this feature, this check still works on broker.
> >
> > makes sense, but usually the HC, like in k8s or in other environments
> > is run every X seconds and usually not concurrently
> >
> > if you have multiple (tens? hundreds?) of Pulsar clients that require
> > the HC, this will be a big problem,
> > is this the reason why you want to add some cache to the response of the HC ?
> >
> > Enrico
> >
> >
> > >
> > >
> > > On 2022/06/22 10:19:52 Enrico Olivelli wrote:
> > > > I believe that this proposal is too broad.
> > > >
> > > > the PIP reads about:
> > > > - adding HEALTHCHECK to the binary protocol
> > > > - add a HEALTHCHECK cache on the broker
> > > >
> > > > Also, I would like to understand in which usecase you can use the
> > > > binary endpoint and not the HTTP endpoint.
> > > >
> > > > Health Check is good for scripts and for probes, I don't expect a
> > > > "client application" to run the HC
> > > >
> > > > Can you please illustrate some practical use cases?
> > > >
> > > > Enric
> > > >
> > > > Il giorno mer 8 giu 2022 alle ore 05:22 zhaocong <zh...@apache.org>
> > > > ha scritto:
> > > > >
> > > > > Hello Pulsar Community,
> > > > >
> > > > >
> > > > > Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol.
> > > > > I look forward to your feedback.
> > > > >
> > > > >
> > > > > PIP: https://github.com/apache/pulsar/issues/15859
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Cong Zhao
> > > >
> >

Re: [DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol

Posted by Cong Zhao <zh...@apache.org>.

Yes, there may have multiple clients request the HC at the same time in the AutoFailover case, so we should add some cache to reduce broker load.

On 2022/06/22 12:55:49 Enrico Olivelli wrote:
> Il giorno mer 22 giu 2022 alle ore 14:45 Cong Zhao
> <zh...@apache.org> ha scritto:
> >
> > Hi Enrico,
> >
> > > Also, I would like to understand in which usecase you can use the
> > > binary endpoint and not the HTTP endpoint.
> >
> > We can't use the HTTP endpoint when the client did not have the admin auth to do a health check. but we need it in some cases such as auto failover on the client-side (https://github.com/apache/pulsar/pull/13316#discussion_r773313991)
> AutoFailover is a valid use case for me.
> Thanks
> 
> >
> > > Health Check is good for scripts and for probes, I don't expect a
> > > "client application" to run the HC
> >
> > Adding a health check API to the client-side just to make it easier to use this feature, this check still works on broker.
> 
> makes sense, but usually the HC, like in k8s or in other environments
> is run every X seconds and usually not concurrently
> 
> if you have multiple (tens? hundreds?) of Pulsar clients that require
> the HC, this will be a big problem,
> is this the reason why you want to add some cache to the response of the HC ?
> 
> Enrico
> 
> 
> >
> >
> > On 2022/06/22 10:19:52 Enrico Olivelli wrote:
> > > I believe that this proposal is too broad.
> > >
> > > the PIP reads about:
> > > - adding HEALTHCHECK to the binary protocol
> > > - add a HEALTHCHECK cache on the broker
> > >
> > > Also, I would like to understand in which usecase you can use the
> > > binary endpoint and not the HTTP endpoint.
> > >
> > > Health Check is good for scripts and for probes, I don't expect a
> > > "client application" to run the HC
> > >
> > > Can you please illustrate some practical use cases?
> > >
> > > Enric
> > >
> > > Il giorno mer 8 giu 2022 alle ore 05:22 zhaocong <zh...@apache.org>
> > > ha scritto:
> > > >
> > > > Hello Pulsar Community,
> > > >
> > > >
> > > > Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol.
> > > > I look forward to your feedback.
> > > >
> > > >
> > > > PIP: https://github.com/apache/pulsar/issues/15859
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Cong Zhao
> > >
>

Re: [DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol

Posted by Enrico Olivelli <eo...@gmail.com>.

Il giorno mer 22 giu 2022 alle ore 14:45 Cong Zhao
<zh...@apache.org> ha scritto:
>
> Hi Enrico,
>
> > Also, I would like to understand in which usecase you can use the
> > binary endpoint and not the HTTP endpoint.
>
> We can't use the HTTP endpoint when the client did not have the admin auth to do a health check. but we need it in some cases such as auto failover on the client-side (https://github.com/apache/pulsar/pull/13316#discussion_r773313991)
AutoFailover is a valid use case for me.
Thanks

>
> > Health Check is good for scripts and for probes, I don't expect a
> > "client application" to run the HC
>
> Adding a health check API to the client-side just to make it easier to use this feature, this check still works on broker.

makes sense, but usually the HC, like in k8s or in other environments
is run every X seconds and usually not concurrently

if you have multiple (tens? hundreds?) of Pulsar clients that require
the HC, this will be a big problem,
is this the reason why you want to add some cache to the response of the HC ?

Enrico


>
>
> On 2022/06/22 10:19:52 Enrico Olivelli wrote:
> > I believe that this proposal is too broad.
> >
> > the PIP reads about:
> > - adding HEALTHCHECK to the binary protocol
> > - add a HEALTHCHECK cache on the broker
> >
> > Also, I would like to understand in which usecase you can use the
> > binary endpoint and not the HTTP endpoint.
> >
> > Health Check is good for scripts and for probes, I don't expect a
> > "client application" to run the HC
> >
> > Can you please illustrate some practical use cases?
> >
> > Enric
> >
> > Il giorno mer 8 giu 2022 alle ore 05:22 zhaocong <zh...@apache.org>
> > ha scritto:
> > >
> > > Hello Pulsar Community,
> > >
> > >
> > > Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol.
> > > I look forward to your feedback.
> > >
> > >
> > > PIP: https://github.com/apache/pulsar/issues/15859
> > >
> > >
> > > Thanks,
> > >
> > > Cong Zhao
> >

Re: [DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol

Posted by Cong Zhao <zh...@apache.org>.

Hi Enrico,

> Also, I would like to understand in which usecase you can use the
> binary endpoint and not the HTTP endpoint.

We can't use the HTTP endpoint when the client did not have the admin auth to do a health check. but we need it in some cases such as auto failover on the client-side (https://github.com/apache/pulsar/pull/13316#discussion_r773313991)

> Health Check is good for scripts and for probes, I don't expect a
> "client application" to run the HC

Adding a health check API to the client-side just to make it easier to use this feature, this check still works on broker.


On 2022/06/22 10:19:52 Enrico Olivelli wrote:
> I believe that this proposal is too broad.
> 
> the PIP reads about:
> - adding HEALTHCHECK to the binary protocol
> - add a HEALTHCHECK cache on the broker
> 
> Also, I would like to understand in which usecase you can use the
> binary endpoint and not the HTTP endpoint.
> 
> Health Check is good for scripts and for probes, I don't expect a
> "client application" to run the HC
> 
> Can you please illustrate some practical use cases?
> 
> Enric
> 
> Il giorno mer 8 giu 2022 alle ore 05:22 zhaocong <zh...@apache.org>
> ha scritto:
> >
> > Hello Pulsar Community,
> >
> >
> > Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol.
> > I look forward to your feedback.
> >
> >
> > PIP: https://github.com/apache/pulsar/issues/15859
> >
> >
> > Thanks,
> >
> > Cong Zhao
>

Re: [DISCUSS] PIP-172: Introduce the HEALTH_CHECK command in the binary protocol

Posted by Enrico Olivelli <eo...@gmail.com>.

I believe that this proposal is too broad.

the PIP reads about:
- adding HEALTHCHECK to the binary protocol
- add a HEALTHCHECK cache on the broker

Also, I would like to understand in which usecase you can use the
binary endpoint and not the HTTP endpoint.

Health Check is good for scripts and for probes, I don't expect a
"client application" to run the HC

Can you please illustrate some practical use cases?

Enric

Il giorno mer 8 giu 2022 alle ore 05:22 zhaocong <zh...@apache.org>
ha scritto:
>
> Hello Pulsar Community,
>
>
> Here is a PIP to introduce the HEALTH_CHECK command in the binary protocol.
> I look forward to your feedback.
>
>
> PIP: https://github.com/apache/pulsar/issues/15859
>
>
> Thanks,
>
> Cong Zhao