You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pulsar.apache.org by Dave Fisher <wa...@apache.org> on 2023/06/06 17:00:16 UTC

Re: [DISCUSS] PIP-264: Enhanced OTel-based metric system

Hi Asaf,

I just watched your presentation from Pulsar Summit. https://www.youtube.com/watch?v=NN4KNK2r4mo <https://www.youtube.com/watch?v=NN4KNK2r4mo>

Just to be clear to all the phrase you used “we decided” along with the timeline is not a Pulsar community we, but a StreamNative corporate we. This discussion may, or may not become a Pulsar community we.

I would find this plan much more reasonable if it were broken down as follows.

(1) Implement and prove the use of OTel as an option for metrics as they work now. Let this community agree with you from experience.

(2) Have a discussion in the Pulsar community about notions like Topic Group and Bundles so that we have the best and most operational logic around reducing cardinality and controlling load and performance. As long as we have bundles and orthogonally add groups we introduce complexity. Topic Groups are useless unless there is some understanding of Bundles and Load.

(3) Perhaps the issues of 100s of metrics should be turned to being intentional about what metrics are helpful and then slowly switching to those that the whole community finds are most helpful in their operations.

Only by doing these three steps carefully in the Open on this list in the Community can there be enough consensus that the whole change is acceptable for Pulsar 4.0 in 18 months.

Best,
Dave

> On May 21, 2023, at 9:00 AM, Asaf Mesika <as...@gmail.com> wrote:
> 
> Thanks for the reply, Enrico.
> Completely agree.
> This made me realize my TL;DR wasn't talking about export.
> I added this to it:
> 
> ---
> Pulsar OTel Metrics will support exporting as Prometheus HTTP endpoint
> (`/metrics` but different port) for backward compatibility and also OLTP,
> so you can push the metrics to OTel Collector and from there ship it to any
> destination.
> ---
> 
> OTel supports two kinds of exporter: Prometheus (HTTP) and OTLP (push).
> We'll just configure to use them.
> 
> 
> 
> On Mon, May 15, 2023 at 10:35 AM Enrico Olivelli <eo...@gmail.com>
> wrote:
> 
>> Asaf,
>> thanks for contributing in this area.
>> Metrics are a fundamental feature of Pulsar.
>> 
>> Currently I find it very awkward to maintain metrics, and also I see
>> it as a problem to support only Prometheus.
>> 
>> Regarding your proposal, IIRC in the past someone else proposed to
>> support other metrics systems and they have been suggested to use a
>> sidecar approach,
>> that is to add something next to Pulsar services that served the
>> metrics in the preferred format/way.
>> I find that the sidecar approach is too inefficient and I am not
>> proposing it (but I wanted to add this reference for the benefit of
>> new people on the list).
>> 
>> I wonder if it would be possible to keep compatibility with the
>> current Prometheus based metrics.
>> Now Pulsar reached a point in which is is widely used by many
>> companies and also with big clusters,
>> telling people that they have to rework all the infrastructure related
>> to metrics because we don't support Prometheus anymore or because we
>> changed radically the way we publish metrics
>> It is a step that seems too hard from my point of view.
>> 
>> Currently I believe that compatibility is more important than
>> versatility, and if we want to introduce new (and far better) features
>> we must take it into account.
>> 
>> So my point is that I generally support the idea of opening the way to
>> Open Telemetry, but we must have a way to not force all of our users
>> to throw away their alerting systems, dashboards and know-how in
>> troubleshooting Pulsar problems in production and dev
>> 
>> Best regards
>> Enrico
>> 
>> Il giorno lun 15 mag 2023 alle ore 02:17 Dave Fisher
>> <wa...@comcast.net> ha scritto:
>>> 
>>> 
>>> 
>>>> On May 10, 2023, at 1:01 AM, Asaf Mesika <as...@gmail.com>
>> wrote:
>>>> 
>>>> On Tue, May 9, 2023 at 11:29 PM Dave Fisher <wa...@apache.org> wrote:
>>>> 
>>>>> 
>>>>> 
>>>>>>> On May 8, 2023, at 2:49 AM, Asaf Mesika <as...@gmail.com>
>> wrote:
>>>>>> 
>>>>>> Your feedback made me realized I need to add "TL;DR" section, which I
>>>>> just
>>>>>> added.
>>>>>> 
>>>>>> I'm quoting it here. It gives a brief summary of the proposal, which
>>>>>> requires up to 5 min of read time, helping you get a high level
>> picture
>>>>>> before you dive into the background/motivation/solution.
>>>>>> 
>>>>>> ----------------------
>>>>>> TL;DR
>>>>>> 
>>>>>> Working with Metrics today as a user or a developer is hard and has
>> many
>>>>>> severe issues.
>>>>>> 
>>>>>> From the user perspective:
>>>>>> 
>>>>>> - One of Pulsar strongest feature is "cheap" topics so you can
>> easily
>>>>>> have 10k - 100k topics per broker. Once you do that, you quickly
>> learn
>>>>> that
>>>>>> the amount of metrics you export via "/metrics" (Prometheus style
>>>>> endpoint)
>>>>>> becomes really big. The cost to store them becomes too high, queries
>>>>>> time-out or even "/metrics" endpoint it self times out.
>>>>>> The only option Pulsar gives you today is all-or-nothing filtering
>> and
>>>>>> very crude aggregation. You switch metrics from topic aggregation
>>>>> level to
>>>>>> namespace aggregation level. Also you can turn off producer and
>>>>> consumer
>>>>>> level metrics. You end up doing it all leaving you "blind", looking
>> at
>>>>> the
>>>>>> metrics from a namespace level which is too high level. You end up
>>>>>> conjuring all kinds of scripts on top of topic stats endpoint to
>> glue
>>>>> some
>>>>>> aggregated metrics view for the topics you need.
>>>>>> - Summaries (metric type giving you quantiles like p95) which are
>> used
>>>>>> in Pulsar, can't be aggregated across topics / brokers due its
>> inherent
>>>>>> design.
>>>>>> - Plugin authors spend too much time on defining and exposing
>> metrics
>>>>> to
>>>>>> Pulsar since the only interface Pulsar offers is writing your
>> metrics
>>>>> by
>>>>>> your self as UTF-8 bytes in Prometheus Text Format to byte stream
>>>>> interface
>>>>>> given to you.
>>>>>> - Pulsar histograms are exported in a way that is not conformant
>> with
>>>>>> Prometheus, which means you can't get the p95 quantile on such
>>>>> histograms,
>>>>>> making them very hard to use in day to day life.
>>>>> 
>>>>> What version of DataSketches is used to produce the histogram? Is is
>> still
>>>>> an old Yahoo one, or are we using an updated one from Apache
>> DataSketches?
>>>>> 
>>>>> Seems like this is a single PR/small PIP for 3.1?
>>>> 
>>>> 
>>>> Histograms are a list of buckets, each is a counter.
>>>> Summary is a collection of values collected over a time window, which
>> at
>>>> the end you get a calculation of the quantiles of those values: p95,
>> p50,
>>>> and those are exported from Pulsar.
>>>> 
>>>> Pulsar histogram do not use Data Sketches.
>>> 
>>> Bookkeeper Metrics wraps Yahoo DataSketches last I checked.
>>> 
>>>> They are just counters.
>>>> They are not adhere to Prometheus since:
>>>> a. The counter is expected to be cumulative, but Pulsar resets each
>> bucket
>>>> counter to 0 every 1 min
>>>> b. The bucket upper range is expected to be written as an attribute
>> "le"
>>>> but today it is encoded in the name of the metric itself.
>>>> 
>>>> This is a breaking change, hence hard to mark in any small release.
>>>> This is why it's part of this PIP since so many things will break, and
>> all
>>>> of them will break on a separate layer (OTel metrics), hence not break
>>>> anyone without their consent.
>>> 
>>> If this change will break existing Grafana dashboards and other
>> operational monitoring already in place then it will break guarantees we
>> have made about safely being able to downgrade from a bad upgrade.
>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>>> - Too many metrics are rates which also delta reset every interval
>> you
>>>>>> configure in Pulsar and restart, instead of relying on cumulative
>> (ever
>>>>>> growing) counters and let Prometheus use its rate function.
>>>>>> - and many more issues
>>>>>> 
>>>>>> From the developer perspective:
>>>>>> 
>>>>>> - There are 4 different ways to define and record metrics in Pulsar:
>>>>>> Pulsar own metrics library, Prometheus Java Client, Bookkeeper
>> metrics
>>>>>> library and plain native Java SDK objects (AtomicLong, ...). It's
>> very
>>>>>> confusing for the developer and create inconsistencies for the end
>> user
>>>>>> (e.g. Summary for example is different in each).
>>>>>> - Patching your metrics into "/metrics" Prometheus endpoint is
>>>>>> confusing, cumbersome and error prone.
>>>>>> - many more
>>>>>> 
>>>>>> This proposal offers several key changes to solve that:
>>>>>> 
>>>>>> - Cardinality (supporting 10k-100k topics per broker) is solved by
>>>>>> introducing a new aggregation level for metrics called Topic Metric
>>>>> Group.
>>>>>> Using configuration, you specify for each topic its group (using
>>>>>> wildcard/regex). This allows you to "zoom" out to a more detailed
>>>>>> granularity level like groups instead of namespaces, which you
>> control
>>>>> how
>>>>>> many groups you'll have hence solving the cardinality issue, without
>>>>>> sacrificing level of detail too much.
>>>>>> - Fine-grained filtering mechanism, dynamic. You'll have rule-based
>>>>>> dynamic configuration, allowing you to specify per
>>>>> namespace/topic/group
>>>>>> which metrics you'd like to keep/drop. Rules allows you to set the
>>>>> default
>>>>>> to have small amount of metrics in group and namespace level only
>> and
>>>>> drop
>>>>>> the rest. When needed, you can add an override rule to "open" up a
>>>>> certain
>>>>>> group to have more metrics in higher granularity (topic or even
>>>>>> consumer/producer level). Since it's dynamic you "open" such a group
>>>>> when
>>>>>> you see it's misbehaving, see it in topic level, and when all
>>>>> resolved, you
>>>>>> can "close" it. A bit similar experience to logging levels in Log4j
>> or
>>>>>> Logback, that you default and override per class/package.
>>>>>> 
>>>>>> Aggregation and Filtering combined solves the cardinality without
>>>>>> sacrificing the level of detail when needed and most importantly, you
>>>>>> determine which topic/group/namespace it happens on.
>>>>>> 
>>>>>> Since this change is so invasive, it requires a single metrics
>> library to
>>>>>> implement all of it on top of; Hence the third big change point is
>>>>>> consolidating all four ways to define and record metrics to a single
>>>>> one, a
>>>>>> new one: OpenTelemtry Metrics (Java SDK, and also Python and Go for
>> the
>>>>>> Pulsar Function runners).
>>>>>> Introducing OpenTelemetry (OTel) solves also the biggest pain point
>> from
>>>>>> the developer perspective, since it's a superb metrics library
>> offering
>>>>>> everything you need, and there is going to be a single way - only it.
>>>>> Also,
>>>>>> it solves the robustness for Plugin author which will use
>> OpenTelemetry.
>>>>> It
>>>>>> so happens that it also solves all the numerous problems described
>> in the
>>>>>> doc itself.
>>>>>> 
>>>>>> The solution will be introduced as another layer with feature
>> toggles, so
>>>>>> you can work with existing system, and/or OTel, until gradually
>>>>> deprecating
>>>>>> existing system.
>>>>>> 
>>>>>> It's a big breaking change for Pulsar users on many fronts: names,
>>>>>> semantics, configuration. Read at the end of this doc to learn
>> exactly
>>>>> what
>>>>>> will change for the user (in high level).
>>>>>> 
>>>>>> In my opinion, it will make Pulsar user experience so much better,
>> they
>>>>>> will want to migrate to it, despite the breaking change.
>>>>>> 
>>>>>> This was a very short summary. You are most welcomed to read the full
>>>>>> design document below and express feedback, so we can make it better.
>>>>>> 
>>>>>> On Sun, May 7, 2023 at 7:52 PM Asaf Mesika <as...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sun, May 7, 2023 at 4:23 PM Yunze Xu
>> <yz...@streamnative.io.invalid>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I'm excited to learn much more about metrics when I started reading
>>>>>>>> this proposal. But I became more and more frustrated when I found
>>>>>>>> there is still too much content left even if I've already spent
>> much
>>>>>>>> time reading this proposal. I'm wondering how much time did you
>> expect
>>>>>>>> reviewers to read through this proposal? I just recalled the
>>>>>>>> discussion you started before [1]. Did you expect each PMC member
>> that
>>>>>>>> gives his/her +1 to read only parts of this proposal?
>>>>>>>> 
>>>>>>> 
>>>>>>> I estimated around 2 hours needed for a reviewer.
>>>>>>> I hate it being so long, but I simply couldn't find a way to
>> downsize it
>>>>>>> more. Furthermore, I consulted with my colleagues including Matteo,
>> but
>>>>> we
>>>>>>> couldn't see a way to scope it down.
>>>>>>> Why? Because once you begin this journey, you need to know how it's
>>>>> going
>>>>>>> to end.
>>>>>>> What I ended up doing, is writing all the crucial details for
>> review in
>>>>>>> the High Level Design section.
>>>>>>> It's still a big, hefty section, but I don't think I can step out
>> or let
>>>>>>> anyone else change Pulsar so invasively without the full extent of
>> the
>>>>>>> change.
>>>>>>> 
>>>>>>> I don't think it's wise to read parts.
>>>>>>> I did my very best effort to minimize it, but the scope is simply
>> big.
>>>>>>> Open for suggestions, but it requires reading all the PIP :)
>>>>>>> 
>>>>>>> Thanks a lot Yunze for dedicating any time to it.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Let's talk back to the proposal, for now, what I mainly learned and
>>>>>>>> are concerned about mostly are:
>>>>>>>> 1. Pulsar has many ways to expose metrics. It's not unified and
>>>>> confusing.
>>>>>>>> 2. The current metrics system cannot support a large amount of
>> topics.
>>>>>>>> 3. It's hard for plugin authors to integrate metrics. (For example,
>>>>>>>> KoP [2] integrates metrics by implementing the
>>>>>>>> PrometheusRawMetricsProvider interface and it indeed needs much
>> work)
>>>>>>>> 
>>>>>>>> Regarding the 1st issue, this proposal chooses OpenTelemetry
>> (OTel).
>>>>>>>> 
>>>>>>>> Regarding the 2nd issue, I scrolled to the "Why OpenTelemetry?"
>>>>>>>> section. It's still frustrating to see no answer. Eventually, I
>> found
>>>>>>>> 
>>>>>>> 
>>>>>>> OpenTelemetry isn't the solution for large amount of topic.
>>>>>>> The solution is described at
>>>>>>> "Aggregate and Filtering to solve cardinality issues" section.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> the explanation in the "What we need to fix in OpenTelemetry -
>>>>>>>> Performance" section. It seems that we still need some
>> enhancements in
>>>>>>>> OTel. In other words, currently OTel is not ready for resolving all
>>>>>>>> these issues listed in the proposal but we believe it will.
>>>>>>>> 
>>>>>>> 
>>>>>>> Let me rephrase "believe" --> we work together with the maintainers
>> to
>>>>> do
>>>>>>> it, yes.
>>>>>>> I am open for any other suggestion.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> As for the 3rd issue, from the "Integrating with Pulsar Plugins"
>>>>>>>> section, the plugin authors still need to implement the new OTel
>>>>>>>> interfaces. Is it much easier than using the existing ways to
>> expose
>>>>>>>> metrics? Could metrics still be easily integrated with Grafana?
>>>>>>>> 
>>>>>>> 
>>>>>>> Yes, it's way easier.
>>>>>>> Basically you have a full fledged metrics library objects: Meter,
>> Gauge,
>>>>>>> Histogram, Counter.
>>>>>>> No more Raw Metrics Provider, writing UTF-8 bytes in Prometheus
>> format.
>>>>>>> You get namespacing for free with Meter name and version.
>>>>>>> It's way better than current solution and any other library.
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> That's all I am concerned about at the moment. I understand, and
>>>>>>>> appreciate that you've spent much time studying and explaining all
>>>>>>>> these things. But, this proposal is still too huge.
>>>>>>>> 
>>>>>>> 
>>>>>>> I appreciate your effort a lot!
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> [1]
>> https://lists.apache.org/thread/04jxqskcwwzdyfghkv4zstxxmzn154kf
>>>>>>>> [2]
>>>>>>>> 
>>>>> 
>> https://github.com/streamnative/kop/blob/master/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/stats/PrometheusMetricsProvider.java
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Yunze
>>>>>>>> 
>>>>>>>> On Sun, May 7, 2023 at 5:53 PM Asaf Mesika <as...@gmail.com>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> I'm very appreciative for feedback from multiple pulsar users and
>> devs
>>>>>>>> on
>>>>>>>>> this PIP, since it has dramatic changes suggested and quite
>> extensive
>>>>>>>>> positive change for the users.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Apr 27, 2023 at 7:32 PM Asaf Mesika <
>> asaf.mesika@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> I'm very excited to release a PIP I've been working on in the
>> past 11
>>>>>>>>>> months, which I think will be immensely valuable to Pulsar,
>> which I
>>>>>>>> like so
>>>>>>>>>> much.
>>>>>>>>>> 
>>>>>>>>>> PIP: https://github.com/apache/pulsar/issues/20197
>>>>>>>>>> 
>>>>>>>>>> I'm quoting here the preface:
>>>>>>>>>> 
>>>>>>>>>> === QUOTE START ===
>>>>>>>>>> 
>>>>>>>>>> Roughly 11 months ago, I started working on solving the biggest
>> issue
>>>>>>>> with
>>>>>>>>>> Pulsar metrics: the lack of ability to monitor a pulsar broker
>> with a
>>>>>>>> large
>>>>>>>>>> topic count: 10k, 100k, and future support of 1M. This started by
>>>>>>>> mapping
>>>>>>>>>> the existing functionality and then enumerating all the problems
>> I
>>>>>>>> saw (all
>>>>>>>>>> documented in this doc
>>>>>>>>>> <
>>>>>>>> 
>>>>> 
>> https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing
>>> 
>>> I thought we were going to stop using Google docs for PIPs.
>>> 
>>>>>>>>> 
>>>>>>>>>> ).
>>>>>>>>>> 
>>>>>>>>>> This PIP is a parent PIP. It aims to gradually solve (using
>> sub-PIPs)
>>>>>>>> all
>>>>>>>>>> the current metric system's problems and provide the ability to
>>>>>>>> monitor a
>>>>>>>>>> broker with a large topic count, which is currently lacking. As a
>>>>>>>> parent
>>>>>>>>>> PIP, it will describe each problem and its solution at a high
>> level,
>>>>>>>>>> leaving fine-grained details to the sub-PIPs. The parent PIP
>> ensures
>>>>>>>> all
>>>>>>>>>> solutions align and does not contradict each other.
>>>>>>>>>> 
>>>>>>>>>> The basic building block to solve the monitoring ability of large
>>>>>>>> topic
>>>>>>>>>> count is aggregating internally (to topic groups) and adding
>>>>>>>> fine-grained
>>>>>>>>>> filtering. We could have shoe-horned it into the existing metric
>>>>>>>> system,
>>>>>>>>>> but we thought adding that to a system already ingrained with
>> many
>>>>>>>> problems
>>>>>>>>>> would be wrong and hard to do gradually, as so many things will
>>>>>>>> break. This
>>>>>>>>>> is why the second-biggest design decision presented here is
>>>>>>>> consolidating
>>>>>>>>>> all existing metric libraries into a single one - OpenTelemetry
>>>>>>>>>> <https://opentelemetry.io/>. The parent PIP will explain why
>>>>>>>>>> OpenTelemetry was chosen out of existing solutions and why it far
>>>>>>>> exceeds
>>>>>>>>>> all other options. I’ve been working closely with the
>> OpenTelemetry
>>>>>>>>>> community in the past eight months: brain-storming this
>> integration,
>>>>>>>> and
>>>>>>>>>> raising issues, in an effort to remove serious blockers to make
>> this
>>>>>>>>>> migration successful.
>>>>>>>>>> 
>>>>>>>>>> I made every effort to summarize this document so that it can be
>>>>>>>> concise
>>>>>>>>>> yet clear. I understand it is an effort to read it and, more so,
>>>>>>>> provide
>>>>>>>>>> meaningful feedback on such a large document; hence I’m very
>> grateful
>>>>>>>> for
>>>>>>>>>> each individual who does so.
>>>>>>>>>> 
>>>>>>>>>> I think this design will help improve the user experience
>> immensely,
>>>>>>>> so it
>>>>>>>>>> is worth the time spent reading it.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> === QUOTE END ===
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks!
>>>>>>>>>> 
>>>>>>>>>> Asaf Mesika
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>

Re: [DISCUSS] PIP-264: Enhanced OTel-based metric system

Posted by Matteo Merli <ma...@gmail.com>.

(2) Have a discussion in the Pulsar community about notions like Topic
Group and Bundles so that we have the best and most operational logic
around reducing cardinality and controlling load and performance. As long
as we have bundles and orthogonally add groups we introduce complexity.
Topic Groups are useless unless there is some understanding of Bundles and
Load.

Isn't this thread the "discussion"? It has been open for more than a month
to let people the time to read and provide feedback on the proposal

--
Matteo Merli
<ma...@gmail.com>


On Tue, Jun 6, 2023 at 10:00 AM Dave Fisher <wa...@apache.org> wrote:

> Hi Asaf,
>
> I just watched your presentation from Pulsar Summit.
> https://www.youtube.com/watch?v=NN4KNK2r4mo <
> https://www.youtube.com/watch?v=NN4KNK2r4mo>
>
> Just to be clear to all the phrase you used “we decided” along with the
> timeline is not a Pulsar community we, but a StreamNative corporate we.
> This discussion may, or may not become a Pulsar community we.
>
> I would find this plan much more reasonable if it were broken down as
> follows.
>
> (1) Implement and prove the use of OTel as an option for metrics as they
> work now. Let this community agree with you from experience.
>
> (2) Have a discussion in the Pulsar community about notions like Topic
> Group and Bundles so that we have the best and most operational logic
> around reducing cardinality and controlling load and performance. As long
> as we have bundles and orthogonally add groups we introduce complexity.
> Topic Groups are useless unless there is some understanding of Bundles and
> Load.
>
> (3) Perhaps the issues of 100s of metrics should be turned to being
> intentional about what metrics are helpful and then slowly switching to
> those that the whole community finds are most helpful in their operations.
>
> Only by doing these three steps carefully in the Open on this list in the
> Community can there be enough consensus that the whole change is acceptable
> for Pulsar 4.0 in 18 months.
>
> Best,
> Dave
>
> > On May 21, 2023, at 9:00 AM, Asaf Mesika <as...@gmail.com> wrote:
> >
> > Thanks for the reply, Enrico.
> > Completely agree.
> > This made me realize my TL;DR wasn't talking about export.
> > I added this to it:
> >
> > ---
> > Pulsar OTel Metrics will support exporting as Prometheus HTTP endpoint
> > (`/metrics` but different port) for backward compatibility and also OLTP,
> > so you can push the metrics to OTel Collector and from there ship it to
> any
> > destination.
> > ---
> >
> > OTel supports two kinds of exporter: Prometheus (HTTP) and OTLP (push).
> > We'll just configure to use them.
> >
> >
> >
> > On Mon, May 15, 2023 at 10:35 AM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> >> Asaf,
> >> thanks for contributing in this area.
> >> Metrics are a fundamental feature of Pulsar.
> >>
> >> Currently I find it very awkward to maintain metrics, and also I see
> >> it as a problem to support only Prometheus.
> >>
> >> Regarding your proposal, IIRC in the past someone else proposed to
> >> support other metrics systems and they have been suggested to use a
> >> sidecar approach,
> >> that is to add something next to Pulsar services that served the
> >> metrics in the preferred format/way.
> >> I find that the sidecar approach is too inefficient and I am not
> >> proposing it (but I wanted to add this reference for the benefit of
> >> new people on the list).
> >>
> >> I wonder if it would be possible to keep compatibility with the
> >> current Prometheus based metrics.
> >> Now Pulsar reached a point in which is is widely used by many
> >> companies and also with big clusters,
> >> telling people that they have to rework all the infrastructure related
> >> to metrics because we don't support Prometheus anymore or because we
> >> changed radically the way we publish metrics
> >> It is a step that seems too hard from my point of view.
> >>
> >> Currently I believe that compatibility is more important than
> >> versatility, and if we want to introduce new (and far better) features
> >> we must take it into account.
> >>
> >> So my point is that I generally support the idea of opening the way to
> >> Open Telemetry, but we must have a way to not force all of our users
> >> to throw away their alerting systems, dashboards and know-how in
> >> troubleshooting Pulsar problems in production and dev
> >>
> >> Best regards
> >> Enrico
> >>
> >> Il giorno lun 15 mag 2023 alle ore 02:17 Dave Fisher
> >> <wa...@comcast.net> ha scritto:
> >>>
> >>>
> >>>
> >>>> On May 10, 2023, at 1:01 AM, Asaf Mesika <as...@gmail.com>
> >> wrote:
> >>>>
> >>>> On Tue, May 9, 2023 at 11:29 PM Dave Fisher <wa...@apache.org> wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>>>> On May 8, 2023, at 2:49 AM, Asaf Mesika <as...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> Your feedback made me realized I need to add "TL;DR" section, which
> I
> >>>>> just
> >>>>>> added.
> >>>>>>
> >>>>>> I'm quoting it here. It gives a brief summary of the proposal, which
> >>>>>> requires up to 5 min of read time, helping you get a high level
> >> picture
> >>>>>> before you dive into the background/motivation/solution.
> >>>>>>
> >>>>>> ----------------------
> >>>>>> TL;DR
> >>>>>>
> >>>>>> Working with Metrics today as a user or a developer is hard and has
> >> many
> >>>>>> severe issues.
> >>>>>>
> >>>>>> From the user perspective:
> >>>>>>
> >>>>>> - One of Pulsar strongest feature is "cheap" topics so you can
> >> easily
> >>>>>> have 10k - 100k topics per broker. Once you do that, you quickly
> >> learn
> >>>>> that
> >>>>>> the amount of metrics you export via "/metrics" (Prometheus style
> >>>>> endpoint)
> >>>>>> becomes really big. The cost to store them becomes too high, queries
> >>>>>> time-out or even "/metrics" endpoint it self times out.
> >>>>>> The only option Pulsar gives you today is all-or-nothing filtering
> >> and
> >>>>>> very crude aggregation. You switch metrics from topic aggregation
> >>>>> level to
> >>>>>> namespace aggregation level. Also you can turn off producer and
> >>>>> consumer
> >>>>>> level metrics. You end up doing it all leaving you "blind", looking
> >> at
> >>>>> the
> >>>>>> metrics from a namespace level which is too high level. You end up
> >>>>>> conjuring all kinds of scripts on top of topic stats endpoint to
> >> glue
> >>>>> some
> >>>>>> aggregated metrics view for the topics you need.
> >>>>>> - Summaries (metric type giving you quantiles like p95) which are
> >> used
> >>>>>> in Pulsar, can't be aggregated across topics / brokers due its
> >> inherent
> >>>>>> design.
> >>>>>> - Plugin authors spend too much time on defining and exposing
> >> metrics
> >>>>> to
> >>>>>> Pulsar since the only interface Pulsar offers is writing your
> >> metrics
> >>>>> by
> >>>>>> your self as UTF-8 bytes in Prometheus Text Format to byte stream
> >>>>> interface
> >>>>>> given to you.
> >>>>>> - Pulsar histograms are exported in a way that is not conformant
> >> with
> >>>>>> Prometheus, which means you can't get the p95 quantile on such
> >>>>> histograms,
> >>>>>> making them very hard to use in day to day life.
> >>>>>
> >>>>> What version of DataSketches is used to produce the histogram? Is is
> >> still
> >>>>> an old Yahoo one, or are we using an updated one from Apache
> >> DataSketches?
> >>>>>
> >>>>> Seems like this is a single PR/small PIP for 3.1?
> >>>>
> >>>>
> >>>> Histograms are a list of buckets, each is a counter.
> >>>> Summary is a collection of values collected over a time window, which
> >> at
> >>>> the end you get a calculation of the quantiles of those values: p95,
> >> p50,
> >>>> and those are exported from Pulsar.
> >>>>
> >>>> Pulsar histogram do not use Data Sketches.
> >>>
> >>> Bookkeeper Metrics wraps Yahoo DataSketches last I checked.
> >>>
> >>>> They are just counters.
> >>>> They are not adhere to Prometheus since:
> >>>> a. The counter is expected to be cumulative, but Pulsar resets each
> >> bucket
> >>>> counter to 0 every 1 min
> >>>> b. The bucket upper range is expected to be written as an attribute
> >> "le"
> >>>> but today it is encoded in the name of the metric itself.
> >>>>
> >>>> This is a breaking change, hence hard to mark in any small release.
> >>>> This is why it's part of this PIP since so many things will break, and
> >> all
> >>>> of them will break on a separate layer (OTel metrics), hence not break
> >>>> anyone without their consent.
> >>>
> >>> If this change will break existing Grafana dashboards and other
> >> operational monitoring already in place then it will break guarantees we
> >> have made about safely being able to downgrade from a bad upgrade.
> >>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>>
> >>>>>> - Too many metrics are rates which also delta reset every interval
> >> you
> >>>>>> configure in Pulsar and restart, instead of relying on cumulative
> >> (ever
> >>>>>> growing) counters and let Prometheus use its rate function.
> >>>>>> - and many more issues
> >>>>>>
> >>>>>> From the developer perspective:
> >>>>>>
> >>>>>> - There are 4 different ways to define and record metrics in Pulsar:
> >>>>>> Pulsar own metrics library, Prometheus Java Client, Bookkeeper
> >> metrics
> >>>>>> library and plain native Java SDK objects (AtomicLong, ...). It's
> >> very
> >>>>>> confusing for the developer and create inconsistencies for the end
> >> user
> >>>>>> (e.g. Summary for example is different in each).
> >>>>>> - Patching your metrics into "/metrics" Prometheus endpoint is
> >>>>>> confusing, cumbersome and error prone.
> >>>>>> - many more
> >>>>>>
> >>>>>> This proposal offers several key changes to solve that:
> >>>>>>
> >>>>>> - Cardinality (supporting 10k-100k topics per broker) is solved by
> >>>>>> introducing a new aggregation level for metrics called Topic Metric
> >>>>> Group.
> >>>>>> Using configuration, you specify for each topic its group (using
> >>>>>> wildcard/regex). This allows you to "zoom" out to a more detailed
> >>>>>> granularity level like groups instead of namespaces, which you
> >> control
> >>>>> how
> >>>>>> many groups you'll have hence solving the cardinality issue, without
> >>>>>> sacrificing level of detail too much.
> >>>>>> - Fine-grained filtering mechanism, dynamic. You'll have rule-based
> >>>>>> dynamic configuration, allowing you to specify per
> >>>>> namespace/topic/group
> >>>>>> which metrics you'd like to keep/drop. Rules allows you to set the
> >>>>> default
> >>>>>> to have small amount of metrics in group and namespace level only
> >> and
> >>>>> drop
> >>>>>> the rest. When needed, you can add an override rule to "open" up a
> >>>>> certain
> >>>>>> group to have more metrics in higher granularity (topic or even
> >>>>>> consumer/producer level). Since it's dynamic you "open" such a group
> >>>>> when
> >>>>>> you see it's misbehaving, see it in topic level, and when all
> >>>>> resolved, you
> >>>>>> can "close" it. A bit similar experience to logging levels in Log4j
> >> or
> >>>>>> Logback, that you default and override per class/package.
> >>>>>>
> >>>>>> Aggregation and Filtering combined solves the cardinality without
> >>>>>> sacrificing the level of detail when needed and most importantly,
> you
> >>>>>> determine which topic/group/namespace it happens on.
> >>>>>>
> >>>>>> Since this change is so invasive, it requires a single metrics
> >> library to
> >>>>>> implement all of it on top of; Hence the third big change point is
> >>>>>> consolidating all four ways to define and record metrics to a single
> >>>>> one, a
> >>>>>> new one: OpenTelemtry Metrics (Java SDK, and also Python and Go for
> >> the
> >>>>>> Pulsar Function runners).
> >>>>>> Introducing OpenTelemetry (OTel) solves also the biggest pain point
> >> from
> >>>>>> the developer perspective, since it's a superb metrics library
> >> offering
> >>>>>> everything you need, and there is going to be a single way - only
> it.
> >>>>> Also,
> >>>>>> it solves the robustness for Plugin author which will use
> >> OpenTelemetry.
> >>>>> It
> >>>>>> so happens that it also solves all the numerous problems described
> >> in the
> >>>>>> doc itself.
> >>>>>>
> >>>>>> The solution will be introduced as another layer with feature
> >> toggles, so
> >>>>>> you can work with existing system, and/or OTel, until gradually
> >>>>> deprecating
> >>>>>> existing system.
> >>>>>>
> >>>>>> It's a big breaking change for Pulsar users on many fronts: names,
> >>>>>> semantics, configuration. Read at the end of this doc to learn
> >> exactly
> >>>>> what
> >>>>>> will change for the user (in high level).
> >>>>>>
> >>>>>> In my opinion, it will make Pulsar user experience so much better,
> >> they
> >>>>>> will want to migrate to it, despite the breaking change.
> >>>>>>
> >>>>>> This was a very short summary. You are most welcomed to read the
> full
> >>>>>> design document below and express feedback, so we can make it
> better.
> >>>>>>
> >>>>>> On Sun, May 7, 2023 at 7:52 PM Asaf Mesika <as...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, May 7, 2023 at 4:23 PM Yunze Xu
> >> <yz...@streamnative.io.invalid>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I'm excited to learn much more about metrics when I started
> reading
> >>>>>>>> this proposal. But I became more and more frustrated when I found
> >>>>>>>> there is still too much content left even if I've already spent
> >> much
> >>>>>>>> time reading this proposal. I'm wondering how much time did you
> >> expect
> >>>>>>>> reviewers to read through this proposal? I just recalled the
> >>>>>>>> discussion you started before [1]. Did you expect each PMC member
> >> that
> >>>>>>>> gives his/her +1 to read only parts of this proposal?
> >>>>>>>>
> >>>>>>>
> >>>>>>> I estimated around 2 hours needed for a reviewer.
> >>>>>>> I hate it being so long, but I simply couldn't find a way to
> >> downsize it
> >>>>>>> more. Furthermore, I consulted with my colleagues including Matteo,
> >> but
> >>>>> we
> >>>>>>> couldn't see a way to scope it down.
> >>>>>>> Why? Because once you begin this journey, you need to know how it's
> >>>>> going
> >>>>>>> to end.
> >>>>>>> What I ended up doing, is writing all the crucial details for
> >> review in
> >>>>>>> the High Level Design section.
> >>>>>>> It's still a big, hefty section, but I don't think I can step out
> >> or let
> >>>>>>> anyone else change Pulsar so invasively without the full extent of
> >> the
> >>>>>>> change.
> >>>>>>>
> >>>>>>> I don't think it's wise to read parts.
> >>>>>>> I did my very best effort to minimize it, but the scope is simply
> >> big.
> >>>>>>> Open for suggestions, but it requires reading all the PIP :)
> >>>>>>>
> >>>>>>> Thanks a lot Yunze for dedicating any time to it.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Let's talk back to the proposal, for now, what I mainly learned
> and
> >>>>>>>> are concerned about mostly are:
> >>>>>>>> 1. Pulsar has many ways to expose metrics. It's not unified and
> >>>>> confusing.
> >>>>>>>> 2. The current metrics system cannot support a large amount of
> >> topics.
> >>>>>>>> 3. It's hard for plugin authors to integrate metrics. (For
> example,
> >>>>>>>> KoP [2] integrates metrics by implementing the
> >>>>>>>> PrometheusRawMetricsProvider interface and it indeed needs much
> >> work)
> >>>>>>>>
> >>>>>>>> Regarding the 1st issue, this proposal chooses OpenTelemetry
> >> (OTel).
> >>>>>>>>
> >>>>>>>> Regarding the 2nd issue, I scrolled to the "Why OpenTelemetry?"
> >>>>>>>> section. It's still frustrating to see no answer. Eventually, I
> >> found
> >>>>>>>>
> >>>>>>>
> >>>>>>> OpenTelemetry isn't the solution for large amount of topic.
> >>>>>>> The solution is described at
> >>>>>>> "Aggregate and Filtering to solve cardinality issues" section.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> the explanation in the "What we need to fix in OpenTelemetry -
> >>>>>>>> Performance" section. It seems that we still need some
> >> enhancements in
> >>>>>>>> OTel. In other words, currently OTel is not ready for resolving
> all
> >>>>>>>> these issues listed in the proposal but we believe it will.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Let me rephrase "believe" --> we work together with the maintainers
> >> to
> >>>>> do
> >>>>>>> it, yes.
> >>>>>>> I am open for any other suggestion.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> As for the 3rd issue, from the "Integrating with Pulsar Plugins"
> >>>>>>>> section, the plugin authors still need to implement the new OTel
> >>>>>>>> interfaces. Is it much easier than using the existing ways to
> >> expose
> >>>>>>>> metrics? Could metrics still be easily integrated with Grafana?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes, it's way easier.
> >>>>>>> Basically you have a full fledged metrics library objects: Meter,
> >> Gauge,
> >>>>>>> Histogram, Counter.
> >>>>>>> No more Raw Metrics Provider, writing UTF-8 bytes in Prometheus
> >> format.
> >>>>>>> You get namespacing for free with Meter name and version.
> >>>>>>> It's way better than current solution and any other library.
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> That's all I am concerned about at the moment. I understand, and
> >>>>>>>> appreciate that you've spent much time studying and explaining all
> >>>>>>>> these things. But, this proposal is still too huge.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I appreciate your effort a lot!
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> [1]
> >> https://lists.apache.org/thread/04jxqskcwwzdyfghkv4zstxxmzn154kf
> >>>>>>>> [2]
> >>>>>>>>
> >>>>>
> >>
> https://github.com/streamnative/kop/blob/master/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/stats/PrometheusMetricsProvider.java
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Yunze
> >>>>>>>>
> >>>>>>>> On Sun, May 7, 2023 at 5:53 PM Asaf Mesika <asaf.mesika@gmail.com
> >
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> I'm very appreciative for feedback from multiple pulsar users and
> >> devs
> >>>>>>>> on
> >>>>>>>>> this PIP, since it has dramatic changes suggested and quite
> >> extensive
> >>>>>>>>> positive change for the users.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, Apr 27, 2023 at 7:32 PM Asaf Mesika <
> >> asaf.mesika@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi all,
> >>>>>>>>>>
> >>>>>>>>>> I'm very excited to release a PIP I've been working on in the
> >> past 11
> >>>>>>>>>> months, which I think will be immensely valuable to Pulsar,
> >> which I
> >>>>>>>> like so
> >>>>>>>>>> much.
> >>>>>>>>>>
> >>>>>>>>>> PIP: https://github.com/apache/pulsar/issues/20197
> >>>>>>>>>>
> >>>>>>>>>> I'm quoting here the preface:
> >>>>>>>>>>
> >>>>>>>>>> === QUOTE START ===
> >>>>>>>>>>
> >>>>>>>>>> Roughly 11 months ago, I started working on solving the biggest
> >> issue
> >>>>>>>> with
> >>>>>>>>>> Pulsar metrics: the lack of ability to monitor a pulsar broker
> >> with a
> >>>>>>>> large
> >>>>>>>>>> topic count: 10k, 100k, and future support of 1M. This started
> by
> >>>>>>>> mapping
> >>>>>>>>>> the existing functionality and then enumerating all the problems
> >> I
> >>>>>>>> saw (all
> >>>>>>>>>> documented in this doc
> >>>>>>>>>> <
> >>>>>>>>
> >>>>>
> >>
> https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing
> >>>
> >>> I thought we were going to stop using Google docs for PIPs.
> >>>
> >>>>>>>>>
> >>>>>>>>>> ).
> >>>>>>>>>>
> >>>>>>>>>> This PIP is a parent PIP. It aims to gradually solve (using
> >> sub-PIPs)
> >>>>>>>> all
> >>>>>>>>>> the current metric system's problems and provide the ability to
> >>>>>>>> monitor a
> >>>>>>>>>> broker with a large topic count, which is currently lacking. As
> a
> >>>>>>>> parent
> >>>>>>>>>> PIP, it will describe each problem and its solution at a high
> >> level,
> >>>>>>>>>> leaving fine-grained details to the sub-PIPs. The parent PIP
> >> ensures
> >>>>>>>> all
> >>>>>>>>>> solutions align and does not contradict each other.
> >>>>>>>>>>
> >>>>>>>>>> The basic building block to solve the monitoring ability of
> large
> >>>>>>>> topic
> >>>>>>>>>> count is aggregating internally (to topic groups) and adding
> >>>>>>>> fine-grained
> >>>>>>>>>> filtering. We could have shoe-horned it into the existing metric
> >>>>>>>> system,
> >>>>>>>>>> but we thought adding that to a system already ingrained with
> >> many
> >>>>>>>> problems
> >>>>>>>>>> would be wrong and hard to do gradually, as so many things will
> >>>>>>>> break. This
> >>>>>>>>>> is why the second-biggest design decision presented here is
> >>>>>>>> consolidating
> >>>>>>>>>> all existing metric libraries into a single one - OpenTelemetry
> >>>>>>>>>> <https://opentelemetry.io/>. The parent PIP will explain why
> >>>>>>>>>> OpenTelemetry was chosen out of existing solutions and why it
> far
> >>>>>>>> exceeds
> >>>>>>>>>> all other options. I’ve been working closely with the
> >> OpenTelemetry
> >>>>>>>>>> community in the past eight months: brain-storming this
> >> integration,
> >>>>>>>> and
> >>>>>>>>>> raising issues, in an effort to remove serious blockers to make
> >> this
> >>>>>>>>>> migration successful.
> >>>>>>>>>>
> >>>>>>>>>> I made every effort to summarize this document so that it can be
> >>>>>>>> concise
> >>>>>>>>>> yet clear. I understand it is an effort to read it and, more so,
> >>>>>>>> provide
> >>>>>>>>>> meaningful feedback on such a large document; hence I’m very
> >> grateful
> >>>>>>>> for
> >>>>>>>>>> each individual who does so.
> >>>>>>>>>>
> >>>>>>>>>> I think this design will help improve the user experience
> >> immensely,
> >>>>>>>> so it
> >>>>>>>>>> is worth the time spent reading it.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> === QUOTE END ===
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>> Asaf Mesika
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>
>
>

Re: [DISCUSS] PIP-264: Enhanced OTel-based metric system

Posted by Asaf Mesika <as...@gmail.com>.

I'm super happy the video helped convey the long PIP in 30 minutes.


On Tue, Jun 6, 2023 at 8:00 PM Dave Fisher <wa...@apache.org> wrote:

> Hi Asaf,
>
>
>
> Just to be clear to all the phrase you used “we decided” along with the
> timeline is not a Pulsar community we, but a StreamNative corporate we.
> This discussion may, or may not become a Pulsar community we.
>
> When I say "we estimate" and "to provide," it means this is a community
effort. I'm using the guidance of community members to help me do the
estimate and also implement this, pending the community will decision to
proceed and vote on this. I didn't say StreamNative in the video, and it's
unrelated to this discussion at all.

This is a community effort.


> I would find this plan much more reasonable if it were broken down as
> follows.
>
> (1) Implement and prove the use of OTel as an option for metrics as they
> work now. Let this community agree with you from experience.
>
> (2) Have a discussion in the Pulsar community about notions like Topic
> Group and Bundles so that we have the best and most operational logic
> around reducing cardinality and controlling load and performance. As long
> as we have bundles and orthogonally add groups we introduce complexity.
> Topic Groups are useless unless there is some understanding of Bundles and
> Load.
>

The uttermost upper goal of this PIP is to solve the pain of running Pulsar
with hundreds of thousands of topics in a cluster.
The "tool" used to achieve that is Topic Groups and Filtering:
* Groups reduce cardinality since they can be in order or thousands.
* Filtering allows you to filter out all high cardinality topic-level
metrics. Without it, groups are useless in solving the cardinality issue
since you have both topic-level (the culprit) and group-level metrics.
* Filtering is vital also since it allows you to "open" a group to see
detailed metrics for all topics contained within it (like logging level)
but not be flooded with 100 metrics time the number of topics you have. It
enables that by allowing you to choose the specific metric you saw
misbehaving at the group level.

So filtering is vital to the grouping feature.

Also vital is renaming metrics and making order in the naming. You can
filter if you can't "select" a domain using a regular expression (for
example, pulsar_messaging_*, which we can't do today since all messaging
metrics are prefixed with pulsar_ like all other domains
pulsar_transactions). Not to mention some metrics are pulsar_ and some
brk_. It's almost impossible to do filtering without making some proper
ordering to the names.

So in order to solve the huge topics use case pain, you must have both:
grouping, filtering, and proper metric naming.

You can't implement filtering in today's system. You have 4 different
libraries. I can't implement filtering for each one. Not to mention an
impossible interface to work with, we gave to the plugins in the form of
Raw Metrics Provider, which is just bytes out to a stream.

So basically, you have to consolidate all 4 metrics libraries into one to
implement filtering - i.e. switching to OpenTelemetry. All the other
libraries either can't support Filtering at scale or lack vital features we
need in a metrics library. I listed the exact reasons in a special section
in the PIP.

To summarize: Solving large topic count, which is the foremost goal of this
PIP, requires grouping, filtering, and switching to a single library (OTel).

Before I go on this huge road, I need the consent of the community on this
basic idea as a whole.
There is no point in spending a huge effort in switching to OpenTelemetry -
it's a huge effort - without knowing I can proceed to the next steps which
is grouping and filtering.
Having just OTel won't achieve the goal I described.

This is why I'm having this discussion now and didn't break it down into
two PIPs or 10 PIPs. Meaning this is why I created a parent PIP - to give a
coherent high level solution to the goals presented before I proceed.

I'm quoting the PIP from the preface:

> This PIP is a parent PIP. It aims to gradually solve (using sub-PIPs) all
> the current metric system's problems and provide the ability to monitor a
> broker with a large topic count, which is currently lacking. As a parent
> PIP, it will describe each problem and its solution at a high level,
> leaving fine-grained details to the sub-PIPs. The parent PIP ensures all
> solutions align and does not contradict each other.


Of course, this will be done in parts, broken into many sub-PIPs.



You mentioned bundles.
Currently, bundles can't serve as the cardinality reducer unit - they are
not user controlled.

What do you mean by "Topic Groups are useless unless there is some
understanding of Bundles and Load."

Bundles are a key design choice made years ago to load balance Pulsar nodes.
Topic groups are completely unrelated to bundles or load.
Even if you come up with a unique design that introduces an abstraction on
top of topics, it probably won't be user controlled as groups - it will
probably be automatic. I hope I managed to explain in the PIP and here why
the user controlling which topics are included in each group is essential.
Right now, there is nothing in motion in that regard, and as I explained,
it's unrelated to this proposal.



> (3) Perhaps the issues of 100s of metrics should be turned to being
> intentional about what metrics are helpful and then slowly switching to
> those that the whole community finds are most helpful in their operations.
>
I think most of the times you don't need 90% of the metrics and you want
them filtered, until that one time you are facing with severe issue and you
want all the metrics you can get to solve it - hence get them back out of
the filter.
I don't think we can get delete those metrics.


>
> Only by doing these three steps carefully in the Open on this list in the
> Community can there be enough consensus that the whole change is acceptable
> for Pulsar 4.0 in 18 months.
>

I'm doing - very carefully - I've taken 11 months to design this.
It's as open as it can be. I shared my intent before I started back in July
in the community meetings. I published a preliminary idea doc in October to
the community, and then went heads down to solve this huge challenge and
wrote the PIP, released at the end of April.

Thanks,

Asaf


>
> Best,
> Dave
>
> > On May 21, 2023, at 9:00 AM, Asaf Mesika <as...@gmail.com> wrote:
> >
> > Thanks for the reply, Enrico.
> > Completely agree.
> > This made me realize my TL;DR wasn't talking about export.
> > I added this to it:
> >
> > ---
> > Pulsar OTel Metrics will support exporting as Prometheus HTTP endpoint
> > (`/metrics` but different port) for backward compatibility and also OLTP,
> > so you can push the metrics to OTel Collector and from there ship it to
> any
> > destination.
> > ---
> >
> > OTel supports two kinds of exporter: Prometheus (HTTP) and OTLP (push).
> > We'll just configure to use them.
> >
> >
> >
> > On Mon, May 15, 2023 at 10:35 AM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> >> Asaf,
> >> thanks for contributing in this area.
> >> Metrics are a fundamental feature of Pulsar.
> >>
> >> Currently I find it very awkward to maintain metrics, and also I see
> >> it as a problem to support only Prometheus.
> >>
> >> Regarding your proposal, IIRC in the past someone else proposed to
> >> support other metrics systems and they have been suggested to use a
> >> sidecar approach,
> >> that is to add something next to Pulsar services that served the
> >> metrics in the preferred format/way.
> >> I find that the sidecar approach is too inefficient and I am not
> >> proposing it (but I wanted to add this reference for the benefit of
> >> new people on the list).
> >>
> >> I wonder if it would be possible to keep compatibility with the
> >> current Prometheus based metrics.
> >> Now Pulsar reached a point in which is is widely used by many
> >> companies and also with big clusters,
> >> telling people that they have to rework all the infrastructure related
> >> to metrics because we don't support Prometheus anymore or because we
> >> changed radically the way we publish metrics
> >> It is a step that seems too hard from my point of view.
> >>
> >> Currently I believe that compatibility is more important than
> >> versatility, and if we want to introduce new (and far better) features
> >> we must take it into account.
> >>
> >> So my point is that I generally support the idea of opening the way to
> >> Open Telemetry, but we must have a way to not force all of our users
> >> to throw away their alerting systems, dashboards and know-how in
> >> troubleshooting Pulsar problems in production and dev
> >>
> >> Best regards
> >> Enrico
> >>
> >> Il giorno lun 15 mag 2023 alle ore 02:17 Dave Fisher
> >> <wa...@comcast.net> ha scritto:
> >>>
> >>>
> >>>
> >>>> On May 10, 2023, at 1:01 AM, Asaf Mesika <as...@gmail.com>
> >> wrote:
> >>>>
> >>>> On Tue, May 9, 2023 at 11:29 PM Dave Fisher <wa...@apache.org> wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>>>> On May 8, 2023, at 2:49 AM, Asaf Mesika <as...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> Your feedback made me realized I need to add "TL;DR" section, which
> I
> >>>>> just
> >>>>>> added.
> >>>>>>
> >>>>>> I'm quoting it here. It gives a brief summary of the proposal, which
> >>>>>> requires up to 5 min of read time, helping you get a high level
> >> picture
> >>>>>> before you dive into the background/motivation/solution.
> >>>>>>
> >>>>>> ----------------------
> >>>>>> TL;DR
> >>>>>>
> >>>>>> Working with Metrics today as a user or a developer is hard and has
> >> many
> >>>>>> severe issues.
> >>>>>>
> >>>>>> From the user perspective:
> >>>>>>
> >>>>>> - One of Pulsar strongest feature is "cheap" topics so you can
> >> easily
> >>>>>> have 10k - 100k topics per broker. Once you do that, you quickly
> >> learn
> >>>>> that
> >>>>>> the amount of metrics you export via "/metrics" (Prometheus style
> >>>>> endpoint)
> >>>>>> becomes really big. The cost to store them becomes too high, queries
> >>>>>> time-out or even "/metrics" endpoint it self times out.
> >>>>>> The only option Pulsar gives you today is all-or-nothing filtering
> >> and
> >>>>>> very crude aggregation. You switch metrics from topic aggregation
> >>>>> level to
> >>>>>> namespace aggregation level. Also you can turn off producer and
> >>>>> consumer
> >>>>>> level metrics. You end up doing it all leaving you "blind", looking
> >> at
> >>>>> the
> >>>>>> metrics from a namespace level which is too high level. You end up
> >>>>>> conjuring all kinds of scripts on top of topic stats endpoint to
> >> glue
> >>>>> some
> >>>>>> aggregated metrics view for the topics you need.
> >>>>>> - Summaries (metric type giving you quantiles like p95) which are
> >> used
> >>>>>> in Pulsar, can't be aggregated across topics / brokers due its
> >> inherent
> >>>>>> design.
> >>>>>> - Plugin authors spend too much time on defining and exposing
> >> metrics
> >>>>> to
> >>>>>> Pulsar since the only interface Pulsar offers is writing your
> >> metrics
> >>>>> by
> >>>>>> your self as UTF-8 bytes in Prometheus Text Format to byte stream
> >>>>> interface
> >>>>>> given to you.
> >>>>>> - Pulsar histograms are exported in a way that is not conformant
> >> with
> >>>>>> Prometheus, which means you can't get the p95 quantile on such
> >>>>> histograms,
> >>>>>> making them very hard to use in day to day life.
> >>>>>
> >>>>> What version of DataSketches is used to produce the histogram? Is is
> >> still
> >>>>> an old Yahoo one, or are we using an updated one from Apache
> >> DataSketches?
> >>>>>
> >>>>> Seems like this is a single PR/small PIP for 3.1?
> >>>>
> >>>>
> >>>> Histograms are a list of buckets, each is a counter.
> >>>> Summary is a collection of values collected over a time window, which
> >> at
> >>>> the end you get a calculation of the quantiles of those values: p95,
> >> p50,
> >>>> and those are exported from Pulsar.
> >>>>
> >>>> Pulsar histogram do not use Data Sketches.
> >>>
> >>> Bookkeeper Metrics wraps Yahoo DataSketches last I checked.
> >>>
> >>>> They are just counters.
> >>>> They are not adhere to Prometheus since:
> >>>> a. The counter is expected to be cumulative, but Pulsar resets each
> >> bucket
> >>>> counter to 0 every 1 min
> >>>> b. The bucket upper range is expected to be written as an attribute
> >> "le"
> >>>> but today it is encoded in the name of the metric itself.
> >>>>
> >>>> This is a breaking change, hence hard to mark in any small release.
> >>>> This is why it's part of this PIP since so many things will break, and
> >> all
> >>>> of them will break on a separate layer (OTel metrics), hence not break
> >>>> anyone without their consent.
> >>>
> >>> If this change will break existing Grafana dashboards and other
> >> operational monitoring already in place then it will break guarantees we
> >> have made about safely being able to downgrade from a bad upgrade.
> >>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>>
> >>>>>> - Too many metrics are rates which also delta reset every interval
> >> you
> >>>>>> configure in Pulsar and restart, instead of relying on cumulative
> >> (ever
> >>>>>> growing) counters and let Prometheus use its rate function.
> >>>>>> - and many more issues
> >>>>>>
> >>>>>> From the developer perspective:
> >>>>>>
> >>>>>> - There are 4 different ways to define and record metrics in Pulsar:
> >>>>>> Pulsar own metrics library, Prometheus Java Client, Bookkeeper
> >> metrics
> >>>>>> library and plain native Java SDK objects (AtomicLong, ...). It's
> >> very
> >>>>>> confusing for the developer and create inconsistencies for the end
> >> user
> >>>>>> (e.g. Summary for example is different in each).
> >>>>>> - Patching your metrics into "/metrics" Prometheus endpoint is
> >>>>>> confusing, cumbersome and error prone.
> >>>>>> - many more
> >>>>>>
> >>>>>> This proposal offers several key changes to solve that:
> >>>>>>
> >>>>>> - Cardinality (supporting 10k-100k topics per broker) is solved by
> >>>>>> introducing a new aggregation level for metrics called Topic Metric
> >>>>> Group.
> >>>>>> Using configuration, you specify for each topic its group (using
> >>>>>> wildcard/regex). This allows you to "zoom" out to a more detailed
> >>>>>> granularity level like groups instead of namespaces, which you
> >> control
> >>>>> how
> >>>>>> many groups you'll have hence solving the cardinality issue, without
> >>>>>> sacrificing level of detail too much.
> >>>>>> - Fine-grained filtering mechanism, dynamic. You'll have rule-based
> >>>>>> dynamic configuration, allowing you to specify per
> >>>>> namespace/topic/group
> >>>>>> which metrics you'd like to keep/drop. Rules allows you to set the
> >>>>> default
> >>>>>> to have small amount of metrics in group and namespace level only
> >> and
> >>>>> drop
> >>>>>> the rest. When needed, you can add an override rule to "open" up a
> >>>>> certain
> >>>>>> group to have more metrics in higher granularity (topic or even
> >>>>>> consumer/producer level). Since it's dynamic you "open" such a group
> >>>>> when
> >>>>>> you see it's misbehaving, see it in topic level, and when all
> >>>>> resolved, you
> >>>>>> can "close" it. A bit similar experience to logging levels in Log4j
> >> or
> >>>>>> Logback, that you default and override per class/package.
> >>>>>>
> >>>>>> Aggregation and Filtering combined solves the cardinality without
> >>>>>> sacrificing the level of detail when needed and most importantly,
> you
> >>>>>> determine which topic/group/namespace it happens on.
> >>>>>>
> >>>>>> Since this change is so invasive, it requires a single metrics
> >> library to
> >>>>>> implement all of it on top of; Hence the third big change point is
> >>>>>> consolidating all four ways to define and record metrics to a single
> >>>>> one, a
> >>>>>> new one: OpenTelemtry Metrics (Java SDK, and also Python and Go for
> >> the
> >>>>>> Pulsar Function runners).
> >>>>>> Introducing OpenTelemetry (OTel) solves also the biggest pain point
> >> from
> >>>>>> the developer perspective, since it's a superb metrics library
> >> offering
> >>>>>> everything you need, and there is going to be a single way - only
> it.
> >>>>> Also,
> >>>>>> it solves the robustness for Plugin author which will use
> >> OpenTelemetry.
> >>>>> It
> >>>>>> so happens that it also solves all the numerous problems described
> >> in the
> >>>>>> doc itself.
> >>>>>>
> >>>>>> The solution will be introduced as another layer with feature
> >> toggles, so
> >>>>>> you can work with existing system, and/or OTel, until gradually
> >>>>> deprecating
> >>>>>> existing system.
> >>>>>>
> >>>>>> It's a big breaking change for Pulsar users on many fronts: names,
> >>>>>> semantics, configuration. Read at the end of this doc to learn
> >> exactly
> >>>>> what
> >>>>>> will change for the user (in high level).
> >>>>>>
> >>>>>> In my opinion, it will make Pulsar user experience so much better,
> >> they
> >>>>>> will want to migrate to it, despite the breaking change.
> >>>>>>
> >>>>>> This was a very short summary. You are most welcomed to read the
> full
> >>>>>> design document below and express feedback, so we can make it
> better.
> >>>>>>
> >>>>>> On Sun, May 7, 2023 at 7:52 PM Asaf Mesika <as...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, May 7, 2023 at 4:23 PM Yunze Xu
> >> <yz...@streamnative.io.invalid>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I'm excited to learn much more about metrics when I started
> reading
> >>>>>>>> this proposal. But I became more and more frustrated when I found
> >>>>>>>> there is still too much content left even if I've already spent
> >> much
> >>>>>>>> time reading this proposal. I'm wondering how much time did you
> >> expect
> >>>>>>>> reviewers to read through this proposal? I just recalled the
> >>>>>>>> discussion you started before [1]. Did you expect each PMC member
> >> that
> >>>>>>>> gives his/her +1 to read only parts of this proposal?
> >>>>>>>>
> >>>>>>>
> >>>>>>> I estimated around 2 hours needed for a reviewer.
> >>>>>>> I hate it being so long, but I simply couldn't find a way to
> >> downsize it
> >>>>>>> more. Furthermore, I consulted with my colleagues including Matteo,
> >> but
> >>>>> we
> >>>>>>> couldn't see a way to scope it down.
> >>>>>>> Why? Because once you begin this journey, you need to know how it's
> >>>>> going
> >>>>>>> to end.
> >>>>>>> What I ended up doing, is writing all the crucial details for
> >> review in
> >>>>>>> the High Level Design section.
> >>>>>>> It's still a big, hefty section, but I don't think I can step out
> >> or let
> >>>>>>> anyone else change Pulsar so invasively without the full extent of
> >> the
> >>>>>>> change.
> >>>>>>>
> >>>>>>> I don't think it's wise to read parts.
> >>>>>>> I did my very best effort to minimize it, but the scope is simply
> >> big.
> >>>>>>> Open for suggestions, but it requires reading all the PIP :)
> >>>>>>>
> >>>>>>> Thanks a lot Yunze for dedicating any time to it.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Let's talk back to the proposal, for now, what I mainly learned
> and
> >>>>>>>> are concerned about mostly are:
> >>>>>>>> 1. Pulsar has many ways to expose metrics. It's not unified and
> >>>>> confusing.
> >>>>>>>> 2. The current metrics system cannot support a large amount of
> >> topics.
> >>>>>>>> 3. It's hard for plugin authors to integrate metrics. (For
> example,
> >>>>>>>> KoP [2] integrates metrics by implementing the
> >>>>>>>> PrometheusRawMetricsProvider interface and it indeed needs much
> >> work)
> >>>>>>>>
> >>>>>>>> Regarding the 1st issue, this proposal chooses OpenTelemetry
> >> (OTel).
> >>>>>>>>
> >>>>>>>> Regarding the 2nd issue, I scrolled to the "Why OpenTelemetry?"
> >>>>>>>> section. It's still frustrating to see no answer. Eventually, I
> >> found
> >>>>>>>>
> >>>>>>>
> >>>>>>> OpenTelemetry isn't the solution for large amount of topic.
> >>>>>>> The solution is described at
> >>>>>>> "Aggregate and Filtering to solve cardinality issues" section.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> the explanation in the "What we need to fix in OpenTelemetry -
> >>>>>>>> Performance" section. It seems that we still need some
> >> enhancements in
> >>>>>>>> OTel. In other words, currently OTel is not ready for resolving
> all
> >>>>>>>> these issues listed in the proposal but we believe it will.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Let me rephrase "believe" --> we work together with the maintainers
> >> to
> >>>>> do
> >>>>>>> it, yes.
> >>>>>>> I am open for any other suggestion.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> As for the 3rd issue, from the "Integrating with Pulsar Plugins"
> >>>>>>>> section, the plugin authors still need to implement the new OTel
> >>>>>>>> interfaces. Is it much easier than using the existing ways to
> >> expose
> >>>>>>>> metrics? Could metrics still be easily integrated with Grafana?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes, it's way easier.
> >>>>>>> Basically you have a full fledged metrics library objects: Meter,
> >> Gauge,
> >>>>>>> Histogram, Counter.
> >>>>>>> No more Raw Metrics Provider, writing UTF-8 bytes in Prometheus
> >> format.
> >>>>>>> You get namespacing for free with Meter name and version.
> >>>>>>> It's way better than current solution and any other library.
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> That's all I am concerned about at the moment. I understand, and
> >>>>>>>> appreciate that you've spent much time studying and explaining all
> >>>>>>>> these things. But, this proposal is still too huge.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I appreciate your effort a lot!
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> [1]
> >> https://lists.apache.org/thread/04jxqskcwwzdyfghkv4zstxxmzn154kf
> >>>>>>>> [2]
> >>>>>>>>
> >>>>>
> >>
> https://github.com/streamnative/kop/blob/master/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/stats/PrometheusMetricsProvider.java
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Yunze
> >>>>>>>>
> >>>>>>>> On Sun, May 7, 2023 at 5:53 PM Asaf Mesika <asaf.mesika@gmail.com
> >
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> I'm very appreciative for feedback from multiple pulsar users and
> >> devs
> >>>>>>>> on
> >>>>>>>>> this PIP, since it has dramatic changes suggested and quite
> >> extensive
> >>>>>>>>> positive change for the users.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, Apr 27, 2023 at 7:32 PM Asaf Mesika <
> >> asaf.mesika@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi all,
> >>>>>>>>>>
> >>>>>>>>>> I'm very excited to release a PIP I've been working on in the
> >> past 11
> >>>>>>>>>> months, which I think will be immensely valuable to Pulsar,
> >> which I
> >>>>>>>> like so
> >>>>>>>>>> much.
> >>>>>>>>>>
> >>>>>>>>>> PIP: https://github.com/apache/pulsar/issues/20197
> >>>>>>>>>>
> >>>>>>>>>> I'm quoting here the preface:
> >>>>>>>>>>
> >>>>>>>>>> === QUOTE START ===
> >>>>>>>>>>
> >>>>>>>>>> Roughly 11 months ago, I started working on solving the biggest
> >> issue
> >>>>>>>> with
> >>>>>>>>>> Pulsar metrics: the lack of ability to monitor a pulsar broker
> >> with a
> >>>>>>>> large
> >>>>>>>>>> topic count: 10k, 100k, and future support of 1M. This started
> by
> >>>>>>>> mapping
> >>>>>>>>>> the existing functionality and then enumerating all the problems
> >> I
> >>>>>>>> saw (all
> >>>>>>>>>> documented in this doc
> >>>>>>>>>> <
> >>>>>>>>
> >>>>>
> >>
> https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing
> >>>
> >>> I thought we were going to stop using Google docs for PIPs.
> >>>
> >>>>>>>>>
> >>>>>>>>>> ).
> >>>>>>>>>>
> >>>>>>>>>> This PIP is a parent PIP. It aims to gradually solve (using
> >> sub-PIPs)
> >>>>>>>> all
> >>>>>>>>>> the current metric system's problems and provide the ability to
> >>>>>>>> monitor a
> >>>>>>>>>> broker with a large topic count, which is currently lacking. As
> a
> >>>>>>>> parent
> >>>>>>>>>> PIP, it will describe each problem and its solution at a high
> >> level,
> >>>>>>>>>> leaving fine-grained details to the sub-PIPs. The parent PIP
> >> ensures
> >>>>>>>> all
> >>>>>>>>>> solutions align and does not contradict each other.
> >>>>>>>>>>
> >>>>>>>>>> The basic building block to solve the monitoring ability of
> large
> >>>>>>>> topic
> >>>>>>>>>> count is aggregating internally (to topic groups) and adding
> >>>>>>>> fine-grained
> >>>>>>>>>> filtering. We could have shoe-horned it into the existing metric
> >>>>>>>> system,
> >>>>>>>>>> but we thought adding that to a system already ingrained with
> >> many
> >>>>>>>> problems
> >>>>>>>>>> would be wrong and hard to do gradually, as so many things will
> >>>>>>>> break. This
> >>>>>>>>>> is why the second-biggest design decision presented here is
> >>>>>>>> consolidating
> >>>>>>>>>> all existing metric libraries into a single one - OpenTelemetry
> >>>>>>>>>> <https://opentelemetry.io/>. The parent PIP will explain why
> >>>>>>>>>> OpenTelemetry was chosen out of existing solutions and why it
> far
> >>>>>>>> exceeds
> >>>>>>>>>> all other options. I’ve been working closely with the
> >> OpenTelemetry
> >>>>>>>>>> community in the past eight months: brain-storming this
> >> integration,
> >>>>>>>> and
> >>>>>>>>>> raising issues, in an effort to remove serious blockers to make
> >> this
> >>>>>>>>>> migration successful.
> >>>>>>>>>>
> >>>>>>>>>> I made every effort to summarize this document so that it can be
> >>>>>>>> concise
> >>>>>>>>>> yet clear. I understand it is an effort to read it and, more so,
> >>>>>>>> provide
> >>>>>>>>>> meaningful feedback on such a large document; hence I’m very
> >> grateful
> >>>>>>>> for
> >>>>>>>>>> each individual who does so.
> >>>>>>>>>>
> >>>>>>>>>> I think this design will help improve the user experience
> >> immensely,
> >>>>>>>> so it
> >>>>>>>>>> is worth the time spent reading it.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> === QUOTE END ===
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>> Asaf Mesika
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>
>
>