You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by "merlimat (via GitHub)" <gi...@apache.org> on 2024/03/03 03:05:12 UTC

[PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

merlimat opened a new pull request, #22178:
URL: https://github.com/apache/pulsar/pull/22178

   <!--
   ### Contribution Checklist
     
     - PR title format should be *[type][component] summary*. For details, see *[Guideline - Pulsar PR Naming Convention](https://pulsar.apache.org/contribute/develop-semantic-title/)*. 
   
     - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.
     
     - Each pull request should address only one issue, not mix up code from multiple issues.
     
     - Each commit in the pull request has a meaningful commit message
   
     - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.
   -->
   
   <!-- Either this PR fixes an issue, -->
   
   Fixes #xyz
   
   <!-- or this PR is one task of an issue -->
   
   Main Issue: #xyz
   
   <!-- If the PR belongs to a PIP, please add the PIP link here -->
   
   PIP: #xyz 
   
   <!-- Details of when a PIP is required and how the PIP process work, please see: https://github.com/apache/pulsar/blob/master/pip/README.md -->
   
   ### Motivation
   
   <!-- Explain here the context, and why you're making that change. What is the problem you're trying to solve. -->
   
   ### Modifications
   
   <!-- Describe the modifications you've done. -->
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Does this pull request potentially affect one of the following parts:
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [x] `doc-required` <!-- Your PR changes impact docs and you will update later -->
   - [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   
   ### Matching PR in forked repository
   
   PR in forked repository: <!-- ENTER URL HERE -->
   
   <!--
   After opening this PR, the build in apache/pulsar will fail and instructions will
   be provided for opening a PR in the PR author's forked repository.
   
   apache/pulsar pull requests should be first tested in your own fork since the 
   apache/pulsar CI based on GitHub Actions has constrained resources and quota.
   GitHub Actions provides separate quota for pull requests that are executed in 
   a forked repository.
   
   The tests will be run in the forked repository until all PR review comments have
   been handled, the tests pass and the PR is approved by a reviewer.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "nodece (via GitHub)" <gi...@apache.org>.
nodece commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1524456355


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |

Review Comment:
   `pulsar.producer.name/id` should be added.



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions closed                                                                                                    |

Review Comment:
   `pulsar.producer.name/id` should be added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1520081179


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -166,36 +122,51 @@ this is the proposed initial set of metrics to export.
 Additional metrics could be added later on, though it's better to start with the set of most important metrics
 and then evaluate any missing information.
 
-| OTel metric name                                | Type      | Unit        | Description                                                                                    |
-|-------------------------------------------------|-----------|-------------|------------------------------------------------------------------------------------------------|
-| `pulsar.client.connections.opened`              | Counter   | connections | Counter of connections opened                                                                  |
-| `pulsar.client.connections.closed`              | Counter   | connections | Counter of connections closed                                                                  |
-| `pulsar.client.connections.failed`              | Counter   | connections | Counter of connections establishment failures                                                  |
-| `pulsar.client.session.opened`                  | Counter   | sessions    | Counter of sessions opened. `type="producer"` or `consumer`                                    |
-| `pulsar.client.session.closed`                  | Counter   | sessions    | Counter of sessions closed. `type="producer"` or `consumer`                                    |
-| `pulsar.client.received`                        | Counter   | messages    | Number of messages received                                                                    |
-| `pulsar.client.received`                        | Counter   | bytes       | Number of bytes received                                                                       |
-| `pulsar.client.consumer.preteched.messages`     | Gauge     | messages    | Number of messages currently sitting in the consumer pre-fetch queue                           |
-| `pulsar.client.consumer.preteched`              | Gauge     | bytes       | Total number of bytes currently sitting in the consumer pre-fetch queue                        |
-| `pulsar.client.consumer.ack`                    | Counter   | messages    | Number of ack operations                                                                       |
-| `pulsar.client.consumer.nack`                   | Counter   | messages    | Number of negative ack operations                                                              |
-| `pulsar.client.consumer.dlq`                    | Counter   | messages    | Number of messages sent to DLQ                                                                 |
-| `pulsar.client.consumer.ack.timeout`            | Counter   | messages    | Number of ack timeouts events                                                                  |
-| `pulsar.client.producer.latency`                | Histogram | seconds     | Publish latency experienced by the application, includes client batching time                  |
-| `pulsar.client.producer.rpc.latency`            | Histogram | seconds     | Publish RPC latency experienced internally by the client when sending data to receiving an ack |
-| `pulsar.client.producer.published`              | Counter   | bytes       | Bytes published                                                                                |
-| `pulsar.client.producer.pending.messages.count` | Gauge     | messages    | Pending messages for this producer                                                             |
-| `pulsar.client.producer.pending.count`          | Gauge     | bytes       | Pending bytes for this producer                                                                |
+| OTel metric name                                | Type      | Unit        | Description                                                                                                                               |
+|-------------------------------------------------|-----------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter   | connections | The number of connections opened                                                                                                          |

Review Comment:
   I agree that it should be included in as part of the documentation. As part of this proposal, it's explained that where appropriate, the metrics are going to be tagged with tenant/topic/etc labal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1513733755


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);

Review Comment:
   Updated, it's still not clear to me how to configure it though. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1524151273


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions closed                                                                                                    |
+| `pulsar.client.consumer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions opened                                                                                                    |
+| `pulsar.client.consumer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions closed                                                                                                    |
+| `pulsar.client.consumer.message.received.count` | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages explicitly received by the consumer application                                                                    |
+| `pulsar.client.consumer.message.received.size`  | Counter       | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of bytes explicitly received by the consumer application                                                                       |
+| `pulsar.client.consumer.receive_queue.count`    | UpDownCounter | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages currently sitting in the consumer receive queue                                                                    |
+| `pulsar.client.consumer.receive_queue.size`     | UpDownCounter | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The total size in bytes of messages currently sitting in the consumer receive queue                                                       |
+| `pulsar.client.consumer.message.ack`            | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of acknowledged messages                                                                                                       |
+| `pulsar.client.consumer.message.nack`           | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of negatively acknowledged messages                                                                                            |
+| `pulsar.client.consumer.message.dlq`            | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages sent to DLQ                                                                                                        |
+| `pulsar.client.consumer.message.ack.timeout`    | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages that were not acknowledged in the configured timeout period, hence, were requested by the client to be redelivered |
+| `pulsar.client.producer.message.send.duration`  | Histogram     | seconds     | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | Publish latency experienced by the application, includes client batching time                                                             |
+| `pulsar.client.producer.rpc.send.duration`      | Histogram     | seconds     | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.response.status="success\|failed"`                                | Publish RPC latency experienced internally by the client when sending data to receiving an ack                                            |
+| `pulsar.client.producer.message.send.size`      | Counter       | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.response.status="success\|failed"`                                | The number of bytes published                                                                                                             |

Review Comment:
   The send latency histogram includes a "count", which can be used to compute the publish rate



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1514915145


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);

Review Comment:
   They have an experimental interface called `ExtendedDoubleCounterBuilder` (the same name for any other type).
   You use the `Meter` to obtain a builder, then cast the builder to `ExtendedDoubleCounterBuilder` and continue to set your name, unit, advice, and build.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1518852220


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);

Review Comment:
   They would define a view for that instrument, and in it they would override the attributes that would be recorded.
   They have 2 ways. One is programmatic, if they created the SDK them selfs completetly:
   ```
                   .registerView(
                           InstrumentSelector.builder()
                                   .setMeterName("hikari")
                                   .setName("http.request.latency")
                                   .build(),
                           View.builder()
                                   .setAttributeFilter(attrName -> attrName.equals("statusCode"))
                                   .build())
   ```
   
   The AttributeFilter is a Predicate that dictates which attributes to record.
   You use `record(attr, 13)`, and say `attr` is `(tenant, namespace, topic)`, then when filter is applied for this instrument, each `attr` will be passed through the filter which will produce a new `Attributes` containing only those attribute keys which passed the filter. 
   I never tested this from a performance perspective. I presume since it's on the client side, it might be negligible. In theory this area of the OTel SDk can be improved by introducing some caching mechanism.
   
   A second option, which I believe is more likely to be used: configuration file.
   The AutoConfigured SDK builder can read a configuration file and configure it self according to it. It will replace ENV variables. The JSON schema for this file is defined a shared repository all SDK uses. Here's an example in which you can see ho a view is defined: https://github.com/open-telemetry/opentelemetry-configuration/blob/main/examples/kitchen-sink.yaml
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1525302313


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions closed                                                                                                    |
+| `pulsar.client.consumer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions opened                                                                                                    |
+| `pulsar.client.consumer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions closed                                                                                                    |
+| `pulsar.client.consumer.message.received.count` | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages explicitly received by the consumer application                                                                    |
+| `pulsar.client.consumer.message.received.size`  | Counter       | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of bytes explicitly received by the consumer application                                                                       |
+| `pulsar.client.consumer.receive_queue.count`    | UpDownCounter | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages currently sitting in the consumer receive queue                                                                    |

Review Comment:
   No since it is to convery one word, and not build an hierarchy.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1525306248


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();

Review Comment:
   aren't you missing the `.openTelemetry(otel)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1518853524


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -166,36 +122,51 @@ this is the proposed initial set of metrics to export.
 Additional metrics could be added later on, though it's better to start with the set of most important metrics
 and then evaluate any missing information.
 
-| OTel metric name                                | Type      | Unit        | Description                                                                                    |
-|-------------------------------------------------|-----------|-------------|------------------------------------------------------------------------------------------------|
-| `pulsar.client.connections.opened`              | Counter   | connections | Counter of connections opened                                                                  |
-| `pulsar.client.connections.closed`              | Counter   | connections | Counter of connections closed                                                                  |
-| `pulsar.client.connections.failed`              | Counter   | connections | Counter of connections establishment failures                                                  |
-| `pulsar.client.session.opened`                  | Counter   | sessions    | Counter of sessions opened. `type="producer"` or `consumer`                                    |
-| `pulsar.client.session.closed`                  | Counter   | sessions    | Counter of sessions closed. `type="producer"` or `consumer`                                    |
-| `pulsar.client.received`                        | Counter   | messages    | Number of messages received                                                                    |
-| `pulsar.client.received`                        | Counter   | bytes       | Number of bytes received                                                                       |
-| `pulsar.client.consumer.preteched.messages`     | Gauge     | messages    | Number of messages currently sitting in the consumer pre-fetch queue                           |
-| `pulsar.client.consumer.preteched`              | Gauge     | bytes       | Total number of bytes currently sitting in the consumer pre-fetch queue                        |
-| `pulsar.client.consumer.ack`                    | Counter   | messages    | Number of ack operations                                                                       |
-| `pulsar.client.consumer.nack`                   | Counter   | messages    | Number of negative ack operations                                                              |
-| `pulsar.client.consumer.dlq`                    | Counter   | messages    | Number of messages sent to DLQ                                                                 |
-| `pulsar.client.consumer.ack.timeout`            | Counter   | messages    | Number of ack timeouts events                                                                  |
-| `pulsar.client.producer.latency`                | Histogram | seconds     | Publish latency experienced by the application, includes client batching time                  |
-| `pulsar.client.producer.rpc.latency`            | Histogram | seconds     | Publish RPC latency experienced internally by the client when sending data to receiving an ack |
-| `pulsar.client.producer.published`              | Counter   | bytes       | Bytes published                                                                                |
-| `pulsar.client.producer.pending.messages.count` | Gauge     | messages    | Pending messages for this producer                                                             |
-| `pulsar.client.producer.pending.count`          | Gauge     | bytes       | Pending bytes for this producer                                                                |
+| OTel metric name                                | Type      | Unit        | Description                                                                                                                               |
+|-------------------------------------------------|-----------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter   | connections | The number of connections opened                                                                                                          |

Review Comment:
   This is what @dragosvictor plans to do (as template) also at server side metrics.
   



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,172 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be disabled by default, deprecated and eventually removed
+in Pulsar 4.0 release.
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+| OTel metric name                                | Type      | Unit        | Description                                                                                                                               |
+|-------------------------------------------------|-----------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter   | connections | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter   | connections | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter   | connections | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter   | sessions    | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter   | sessions    | The number of producer sessions closed                                                                                                    |
+| `pulsar.client.consumer.opened`                 | Counter   | sessions    | The number of consumer sessions opened                                                                                                    |
+| `pulsar.client.consumer.closed`                 | Counter   | sessions    | The number of consumer sessions closed                                                                                                    |
+| `pulsar.client.consumer.message.received.count` | Counter   | messages    | The number of messages explicitly received by the consumer application                                                                    |
+| `pulsar.client.consumer.message.received.size`  | Counter   | bytes       | The number of bytes explicitly received by the consumer application                                                                       |
+| `pulsar.client.consumer.receive_queue.count`    | Gauge     | messages    | Number of messages currently sitting in the consumer receive queue                                                                        |
+| `pulsar.client.consumer.receive_queue.size`     | Gauge     | bytes       | Total number of bytes currently sitting in the consumer receive queue                                                                     |
+| `pulsar.client.consumer.message.ack`            | Counter   | messages    | The number of acknowledged messages                                                                                                       |
+| `pulsar.client.consumer.message.nack`           | Counter   | messages    | Number of negative ack operations                                                                                                         |
+| `pulsar.client.consumer.message.dlq`            | Counter   | messages    | The number of messages sent to DLQ                                                                                                        |
+| `pulsar.client.consumer.message.ack.timeout`    | Counter   | messages    | The number of messages that were not acknowledged in the configured timeout period, hence, were requested by the client to be redelivered |
+| `pulsar.client.producer.message.send.duration`  | Histogram | seconds     | Publish latency experienced by the application, includes client batching time                                                             |
+| `pulsar.client.producer.rpc.send.duration`      | Histogram | seconds     | Publish RPC latency experienced internally by the client when sending data to receiving an ack                                            |
+| `pulsar.client.producer.message.send.size`      | Counter   | bytes       | The number of bytes published                                                                                                             |
+| `pulsar.client.producer.message.pending.count"` | Gauge     | messages    | The number of messages in the producer internal send queue, waiting to be sent                                                            |

Review Comment:
   Remove `"`
   Gauge --> UpDownCounter: This is true to all `Gauge` in this table.



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -166,36 +122,51 @@ this is the proposed initial set of metrics to export.
 Additional metrics could be added later on, though it's better to start with the set of most important metrics
 and then evaluate any missing information.
 
-| OTel metric name                                | Type      | Unit        | Description                                                                                    |
-|-------------------------------------------------|-----------|-------------|------------------------------------------------------------------------------------------------|
-| `pulsar.client.connections.opened`              | Counter   | connections | Counter of connections opened                                                                  |
-| `pulsar.client.connections.closed`              | Counter   | connections | Counter of connections closed                                                                  |
-| `pulsar.client.connections.failed`              | Counter   | connections | Counter of connections establishment failures                                                  |
-| `pulsar.client.session.opened`                  | Counter   | sessions    | Counter of sessions opened. `type="producer"` or `consumer`                                    |
-| `pulsar.client.session.closed`                  | Counter   | sessions    | Counter of sessions closed. `type="producer"` or `consumer`                                    |
-| `pulsar.client.received`                        | Counter   | messages    | Number of messages received                                                                    |
-| `pulsar.client.received`                        | Counter   | bytes       | Number of bytes received                                                                       |
-| `pulsar.client.consumer.preteched.messages`     | Gauge     | messages    | Number of messages currently sitting in the consumer pre-fetch queue                           |
-| `pulsar.client.consumer.preteched`              | Gauge     | bytes       | Total number of bytes currently sitting in the consumer pre-fetch queue                        |
-| `pulsar.client.consumer.ack`                    | Counter   | messages    | Number of ack operations                                                                       |
-| `pulsar.client.consumer.nack`                   | Counter   | messages    | Number of negative ack operations                                                              |
-| `pulsar.client.consumer.dlq`                    | Counter   | messages    | Number of messages sent to DLQ                                                                 |
-| `pulsar.client.consumer.ack.timeout`            | Counter   | messages    | Number of ack timeouts events                                                                  |
-| `pulsar.client.producer.latency`                | Histogram | seconds     | Publish latency experienced by the application, includes client batching time                  |
-| `pulsar.client.producer.rpc.latency`            | Histogram | seconds     | Publish RPC latency experienced internally by the client when sending data to receiving an ack |
-| `pulsar.client.producer.published`              | Counter   | bytes       | Bytes published                                                                                |
-| `pulsar.client.producer.pending.messages.count` | Gauge     | messages    | Pending messages for this producer                                                             |
-| `pulsar.client.producer.pending.count`          | Gauge     | bytes       | Pending bytes for this producer                                                                |
+| OTel metric name                                | Type      | Unit        | Description                                                                                                                               |
+|-------------------------------------------------|-----------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter   | connections | The number of connections opened                                                                                                          |

Review Comment:
   I think it can be beneficial to include the attributes being recorded. Like k8s is doing: https://kubernetes.io/docs/reference/instrumentation/metrics/.



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,172 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be disabled by default, deprecated and eventually removed
+in Pulsar 4.0 release.
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+| OTel metric name                                | Type      | Unit        | Description                                                                                                                               |
+|-------------------------------------------------|-----------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter   | connections | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter   | connections | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter   | connections | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter   | sessions    | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter   | sessions    | The number of producer sessions closed                                                                                                    |
+| `pulsar.client.consumer.opened`                 | Counter   | sessions    | The number of consumer sessions opened                                                                                                    |
+| `pulsar.client.consumer.closed`                 | Counter   | sessions    | The number of consumer sessions closed                                                                                                    |
+| `pulsar.client.consumer.message.received.count` | Counter   | messages    | The number of messages explicitly received by the consumer application                                                                    |
+| `pulsar.client.consumer.message.received.size`  | Counter   | bytes       | The number of bytes explicitly received by the consumer application                                                                       |
+| `pulsar.client.consumer.receive_queue.count`    | Gauge     | messages    | Number of messages currently sitting in the consumer receive queue                                                                        |
+| `pulsar.client.consumer.receive_queue.size`     | Gauge     | bytes       | Total number of bytes currently sitting in the consumer receive queue                                                                     |
+| `pulsar.client.consumer.message.ack`            | Counter   | messages    | The number of acknowledged messages                                                                                                       |
+| `pulsar.client.consumer.message.nack`           | Counter   | messages    | Number of negative ack operations                                                                                                         |

Review Comment:
   Matching description above: "The number of negative acknowledged messages"



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel=AutoConfiguredOpenTelemetrySdk.builder()
+        .build().getOpenTelemetrySdk();
+
+        PulsarClient client=PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Cardinality enum will allow to select a default cardinality label to be attached to the
+metrics:
+
+```java
+public enum MetricsCardinality {
+    /**
+     * Do not add additional labels to metrics
+     */
+    None,
+
+    /**
+     * Label metrics by tenant
+     */
+    Tenant,
+
+    /**
+     * Label metrics by tenant and namespace
+     */
+    Namespace,
+
+    /**
+     * Label metrics by topic
+     */
+    Topic,
+
+    /**
+     * Label metrics by each partition
+     */
+    Partition,
+}
+```
+
+The labels are addictive. For example, selecting `Topic` level would mean that the metrics will be
+labeled like:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public",topic="persistent://public/default/pt"} 149.0
+```
+
+While selecting `Namespace` level:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public"} 149.0
+```
+
+### Deprecating the old stats methods

Review Comment:
   Can we state that the metric initially will be marked experimental, and will stabilized towards v4.0 ?



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,172 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be disabled by default, deprecated and eventually removed
+in Pulsar 4.0 release.
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+| OTel metric name                                | Type      | Unit        | Description                                                                                                                               |
+|-------------------------------------------------|-----------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter   | connections | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter   | connections | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter   | connections | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter   | sessions    | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter   | sessions    | The number of producer sessions closed                                                                                                    |
+| `pulsar.client.consumer.opened`                 | Counter   | sessions    | The number of consumer sessions opened                                                                                                    |
+| `pulsar.client.consumer.closed`                 | Counter   | sessions    | The number of consumer sessions closed                                                                                                    |
+| `pulsar.client.consumer.message.received.count` | Counter   | messages    | The number of messages explicitly received by the consumer application                                                                    |
+| `pulsar.client.consumer.message.received.size`  | Counter   | bytes       | The number of bytes explicitly received by the consumer application                                                                       |
+| `pulsar.client.consumer.receive_queue.count`    | Gauge     | messages    | Number of messages currently sitting in the consumer receive queue                                                                        |
+| `pulsar.client.consumer.receive_queue.size`     | Gauge     | bytes       | Total number of bytes currently sitting in the consumer receive queue                                                                     |
+| `pulsar.client.consumer.message.ack`            | Counter   | messages    | The number of acknowledged messages                                                                                                       |
+| `pulsar.client.consumer.message.nack`           | Counter   | messages    | Number of negative ack operations                                                                                                         |
+| `pulsar.client.consumer.message.dlq`            | Counter   | messages    | The number of messages sent to DLQ                                                                                                        |
+| `pulsar.client.consumer.message.ack.timeout`    | Counter   | messages    | The number of messages that were not acknowledged in the configured timeout period, hence, were requested by the client to be redelivered |
+| `pulsar.client.producer.message.send.duration`  | Histogram | seconds     | Publish latency experienced by the application, includes client batching time                                                             |
+| `pulsar.client.producer.rpc.send.duration`      | Histogram | seconds     | Publish RPC latency experienced internally by the client when sending data to receiving an ack                                            |
+| `pulsar.client.producer.message.send.size`      | Counter   | bytes       | The number of bytes published                                                                                                             |
+| `pulsar.client.producer.message.pending.count"` | Gauge     | messages    | The number of messages in the producer internal send queue, waiting to be sent                                                            |
+| `pulsar.client.producer.message.pending.size`   | Gauge     | bytes       | The size of the messages in the producer internal queue, waiting to sent                                                                  |
+
+Topic lookup metric will be differentiated by the lookup type label and by the lookup transport
+mechanism (`pulsar.lookup.transport-type="binary|http"`):
+
+| OTel metric name                                         | Type      | Unit    | Description                                       |
+|----------------------------------------------------------|-----------|---------|---------------------------------------------------|
+| `pulsar.client.lookup{pulsar.lookup.type="topic"}`       | Histogram | seconds | Duration of topic lookup operations               |

Review Comment:
   Also `"pulsar.client.lookup.duration`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1518852409


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);

Review Comment:
   This is why I wanted us to add documentation to the path we most likely anticipate people would use Pulsar with OTel.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "nodece (via GitHub)" <gi...@apache.org>.
nodece commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1524459204


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |

Review Comment:
   Consumer metrics are also the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
asafm commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1513136592


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);
+}
+```
+
+The common usage for an application would be something like:

Review Comment:
   It's worth mentioning that we'll support OTel Java agent (a.k.a automatic instrumentation) users. If they don't specify an OpenTelemetry instance with the builder, we'll default to `GlobalOpenTelemetry.get()`, containing the agent's SDK instance if the agent is used.
   



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel=AutoConfiguredOpenTelemetrySdk.builder()
+        .build().getOpenTelemetrySdk();
+
+        PulsarClient client=PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Cardinality enum will allow to select a default cardinality label to be attached to the
+metrics:
+
+```java
+public enum MetricsCardinality {
+    /**
+     * Do not add additional labels to metrics
+     */
+    None,
+
+    /**
+     * Label metrics by tenant
+     */
+    Tenant,
+
+    /**
+     * Label metrics by tenant and namespace
+     */
+    Namespace,
+
+    /**
+     * Label metrics by topic
+     */
+    Topic,
+
+    /**
+     * Label metrics by each partition
+     */
+    Partition,
+}
+```
+
+The labels are addictive. For example, selecting `Topic` level would mean that the metrics will be
+labeled like:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public",topic="persistent://public/default/pt"} 149.0
+```
+
+While selecting `Namespace` level:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public"} 149.0
+```
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be disabled by default, deprecated and eventually removed
+in Pulsar 4.0 release.
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+| OTel metric name                                | Type      | Unit        | Description                                                                                    |

Review Comment:
   I commented on metric names in WIP PR, so this table needs to be amended.
   



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);

Review Comment:
   As discussed, it is preferred to use the experimental view advice to default to only the namespace attribute, and let the user use views to expand the attributes collected to its choosing, while we define the finest granularity. Worth mention it's not dynamic, but I think it's good enough for now. The metric filter developed in OTel designated for server side will be able to serve us here as well.



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel=AutoConfiguredOpenTelemetrySdk.builder()
+        .build().getOpenTelemetrySdk();
+
+        PulsarClient client=PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")

Review Comment:
   you're missing a line here.



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel=AutoConfiguredOpenTelemetrySdk.builder()
+        .build().getOpenTelemetrySdk();
+
+        PulsarClient client=PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Cardinality enum will allow to select a default cardinality label to be attached to the
+metrics:
+
+```java
+public enum MetricsCardinality {
+    /**
+     * Do not add additional labels to metrics
+     */
+    None,
+
+    /**
+     * Label metrics by tenant
+     */
+    Tenant,
+
+    /**
+     * Label metrics by tenant and namespace
+     */
+    Namespace,
+
+    /**
+     * Label metrics by topic
+     */
+    Topic,
+
+    /**
+     * Label metrics by each partition
+     */
+    Partition,
+}
+```
+
+The labels are addictive. For example, selecting `Topic` level would mean that the metrics will be
+labeled like:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public",topic="persistent://public/default/pt"} 149.0
+```
+
+While selecting `Namespace` level:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public"} 149.0
+```
+
+### Deprecating the old stats methods

Review Comment:
   I suggest 3 phases:
   1. OTel is added as experimental, but old stats are active. You can just disable old stats if you want.
   2. OTel is no longer experimental, then old stats are marked deprecated.
   3. v4 we remove old stats.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1525111856


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |

Review Comment:
   I feel this would be a bit too much information attached. We can revisit later on. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "nodece (via GitHub)" <gi...@apache.org>.
nodece commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1525118216


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |

Review Comment:
   Ok, just a suggestion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1515053007


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel=AutoConfiguredOpenTelemetrySdk.builder()
+        .build().getOpenTelemetrySdk();
+
+        PulsarClient client=PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Cardinality enum will allow to select a default cardinality label to be attached to the
+metrics:
+
+```java
+public enum MetricsCardinality {
+    /**
+     * Do not add additional labels to metrics
+     */
+    None,
+
+    /**
+     * Label metrics by tenant
+     */
+    Tenant,
+
+    /**
+     * Label metrics by tenant and namespace
+     */
+    Namespace,
+
+    /**
+     * Label metrics by topic
+     */
+    Topic,
+
+    /**
+     * Label metrics by each partition
+     */
+    Partition,
+}
+```
+
+The labels are addictive. For example, selecting `Topic` level would mean that the metrics will be
+labeled like:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public",topic="persistent://public/default/pt"} 149.0
+```
+
+While selecting `Namespace` level:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public"} 149.0
+```
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be disabled by default, deprecated and eventually removed
+in Pulsar 4.0 release.
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+| OTel metric name                                | Type      | Unit        | Description                                                                                    |

Review Comment:
   I've updated the names here as well, based on the feedback



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1525314228


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();

Review Comment:
   Ups



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "nodece (via GitHub)" <gi...@apache.org>.
nodece commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1524453978


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions closed                                                                                                    |
+| `pulsar.client.consumer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions opened                                                                                                    |
+| `pulsar.client.consumer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions closed                                                                                                    |
+| `pulsar.client.consumer.message.received.count` | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages explicitly received by the consumer application                                                                    |
+| `pulsar.client.consumer.message.received.size`  | Counter       | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of bytes explicitly received by the consumer application                                                                       |
+| `pulsar.client.consumer.receive_queue.count`    | UpDownCounter | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages currently sitting in the consumer receive queue                                                                    |

Review Comment:
   `_` should be `.`?



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions closed                                                                                                    |
+| `pulsar.client.consumer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions opened                                                                                                    |
+| `pulsar.client.consumer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions closed                                                                                                    |
+| `pulsar.client.consumer.message.received.count` | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages explicitly received by the consumer application                                                                    |
+| `pulsar.client.consumer.message.received.size`  | Counter       | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of bytes explicitly received by the consumer application                                                                       |
+| `pulsar.client.consumer.receive_queue.count`    | UpDownCounter | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages currently sitting in the consumer receive queue                                                                    |
+| `pulsar.client.consumer.receive_queue.size`     | UpDownCounter | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The total size in bytes of messages currently sitting in the consumer receive queue                                                       |

Review Comment:
   `_` should be `.`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat merged PR #22178:
URL: https://github.com/apache/pulsar/pull/22178


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1515052491


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);

Review Comment:
   Yes, I did add that in the latest versions of the WIP PR. I'm not sure how a user would configure a different aggregation level of the metrics



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1520095637


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel=AutoConfiguredOpenTelemetrySdk.builder()
+        .build().getOpenTelemetrySdk();
+
+        PulsarClient client=PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Cardinality enum will allow to select a default cardinality label to be attached to the
+metrics:
+
+```java
+public enum MetricsCardinality {
+    /**
+     * Do not add additional labels to metrics
+     */
+    None,
+
+    /**
+     * Label metrics by tenant
+     */
+    Tenant,
+
+    /**
+     * Label metrics by tenant and namespace
+     */
+    Namespace,
+
+    /**
+     * Label metrics by topic
+     */
+    Topic,
+
+    /**
+     * Label metrics by each partition
+     */
+    Partition,
+}
+```
+
+The labels are addictive. For example, selecting `Topic` level would mean that the metrics will be
+labeled like:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public",topic="persistent://public/default/pt"} 149.0
+```
+
+While selecting `Namespace` level:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public"} 149.0
+```
+
+### Deprecating the old stats methods

Review Comment:
   The old metric needs to be deprecated now. We can change this by leaving them on by default until next release (3.4) and removed completely in 4.0. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "KevinLiLu (via GitHub)" <gi...@apache.org>.
KevinLiLu commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1524114034


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions closed                                                                                                    |
+| `pulsar.client.consumer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions opened                                                                                                    |
+| `pulsar.client.consumer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions closed                                                                                                    |
+| `pulsar.client.consumer.message.received.count` | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages explicitly received by the consumer application                                                                    |
+| `pulsar.client.consumer.message.received.size`  | Counter       | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of bytes explicitly received by the consumer application                                                                       |
+| `pulsar.client.consumer.receive_queue.count`    | UpDownCounter | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages currently sitting in the consumer receive queue                                                                    |
+| `pulsar.client.consumer.receive_queue.size`     | UpDownCounter | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The total size in bytes of messages currently sitting in the consumer receive queue                                                       |
+| `pulsar.client.consumer.message.ack`            | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of acknowledged messages                                                                                                       |
+| `pulsar.client.consumer.message.nack`           | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of negatively acknowledged messages                                                                                            |
+| `pulsar.client.consumer.message.dlq`            | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages sent to DLQ                                                                                                        |
+| `pulsar.client.consumer.message.ack.timeout`    | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages that were not acknowledged in the configured timeout period, hence, were requested by the client to be redelivered |
+| `pulsar.client.producer.message.send.duration`  | Histogram     | seconds     | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | Publish latency experienced by the application, includes client batching time                                                             |
+| `pulsar.client.producer.rpc.send.duration`      | Histogram     | seconds     | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.response.status="success\|failed"`                                | Publish RPC latency experienced internally by the client when sending data to receiving an ack                                            |
+| `pulsar.client.producer.message.send.size`      | Counter       | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.response.status="success\|failed"`                                | The number of bytes published                                                                                                             |

Review Comment:
   Is there a reason why we don't have producer message send count?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "nodece (via GitHub)" <gi...@apache.org>.
nodece commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1524456499


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions closed                                                                                                    |

Review Comment:
   `pulsar.producer.name/id` should be added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1520089395


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);

Review Comment:
   Sure. In any case, I would leave the documentation on how specifically to configure OTel out of the scope of this design doc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1524099464


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,201 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options: 
+
+ 1. Accept an `OpenTelemetry` object directly in Pulsar API
+ 2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+    provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+ 1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+    on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+    Pulsar-specific interface
+ 2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+    API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+ 3. Providing a custom interface has several drawbacks:
+     1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+     2. The surface of this plugin API can become quite big when there are several metrics
+     3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+        custom way to expose metrics, we can see the level of integration burden that is pushed to application
+        developers
+ 4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+    are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+
+    ClientBuilder openTelemetryMetricsCardinality(MetricsCardinality metricsCardinality);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel=AutoConfiguredOpenTelemetrySdk.builder()
+        .build().getOpenTelemetrySdk();
+
+        PulsarClient client=PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Cardinality enum will allow to select a default cardinality label to be attached to the
+metrics:
+
+```java
+public enum MetricsCardinality {
+    /**
+     * Do not add additional labels to metrics
+     */
+    None,
+
+    /**
+     * Label metrics by tenant
+     */
+    Tenant,
+
+    /**
+     * Label metrics by tenant and namespace
+     */
+    Namespace,
+
+    /**
+     * Label metrics by topic
+     */
+    Topic,
+
+    /**
+     * Label metrics by each partition
+     */
+    Partition,
+}
+```
+
+The labels are addictive. For example, selecting `Topic` level would mean that the metrics will be
+labeled like:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public",topic="persistent://public/default/pt"} 149.0
+```
+
+While selecting `Namespace` level:
+
+```
+pulsar_client_received_total{namespace="public/default",tenant="public"} 149.0
+```
+
+### Deprecating the old stats methods

Review Comment:
   Added note about "experimental"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "merlimat (via GitHub)" <gi...@apache.org>.
merlimat commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1524096177


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -166,36 +122,51 @@ this is the proposed initial set of metrics to export.
 Additional metrics could be added later on, though it's better to start with the set of most important metrics
 and then evaluate any missing information.
 
-| OTel metric name                                | Type      | Unit        | Description                                                                                    |
-|-------------------------------------------------|-----------|-------------|------------------------------------------------------------------------------------------------|
-| `pulsar.client.connections.opened`              | Counter   | connections | Counter of connections opened                                                                  |
-| `pulsar.client.connections.closed`              | Counter   | connections | Counter of connections closed                                                                  |
-| `pulsar.client.connections.failed`              | Counter   | connections | Counter of connections establishment failures                                                  |
-| `pulsar.client.session.opened`                  | Counter   | sessions    | Counter of sessions opened. `type="producer"` or `consumer`                                    |
-| `pulsar.client.session.closed`                  | Counter   | sessions    | Counter of sessions closed. `type="producer"` or `consumer`                                    |
-| `pulsar.client.received`                        | Counter   | messages    | Number of messages received                                                                    |
-| `pulsar.client.received`                        | Counter   | bytes       | Number of bytes received                                                                       |
-| `pulsar.client.consumer.preteched.messages`     | Gauge     | messages    | Number of messages currently sitting in the consumer pre-fetch queue                           |
-| `pulsar.client.consumer.preteched`              | Gauge     | bytes       | Total number of bytes currently sitting in the consumer pre-fetch queue                        |
-| `pulsar.client.consumer.ack`                    | Counter   | messages    | Number of ack operations                                                                       |
-| `pulsar.client.consumer.nack`                   | Counter   | messages    | Number of negative ack operations                                                              |
-| `pulsar.client.consumer.dlq`                    | Counter   | messages    | Number of messages sent to DLQ                                                                 |
-| `pulsar.client.consumer.ack.timeout`            | Counter   | messages    | Number of ack timeouts events                                                                  |
-| `pulsar.client.producer.latency`                | Histogram | seconds     | Publish latency experienced by the application, includes client batching time                  |
-| `pulsar.client.producer.rpc.latency`            | Histogram | seconds     | Publish RPC latency experienced internally by the client when sending data to receiving an ack |
-| `pulsar.client.producer.published`              | Counter   | bytes       | Bytes published                                                                                |
-| `pulsar.client.producer.pending.messages.count` | Gauge     | messages    | Pending messages for this producer                                                             |
-| `pulsar.client.producer.pending.count`          | Gauge     | bytes       | Pending bytes for this producer                                                                |
+| OTel metric name                                | Type      | Unit        | Description                                                                                                                               |
+|-------------------------------------------------|-----------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter   | connections | The number of connections opened                                                                                                          |

Review Comment:
   Added all the attributes here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [improve] PIP 342: Support OpenTelemetry metrics in Pulsar client [pulsar]

Posted by "codelipenghui (via GitHub)" <gi...@apache.org>.
codelipenghui commented on code in PR #22178:
URL: https://github.com/apache/pulsar/pull/22178#discussion_r1524470827


##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions closed                                                                                                    |
+| `pulsar.client.consumer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions opened                                                                                                    |
+| `pulsar.client.consumer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions closed                                                                                                    |
+| `pulsar.client.consumer.message.received.count` | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages explicitly received by the consumer application                                                                    |
+| `pulsar.client.consumer.message.received.size`  | Counter       | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of bytes explicitly received by the consumer application                                                                       |
+| `pulsar.client.consumer.receive_queue.count`    | UpDownCounter | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages currently sitting in the consumer receive queue                                                                    |

Review Comment:
   I found that the `.` will be replaced with `_` in the Prometheus exporter. I have uploaded the integration test result with Prometheus exporter https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/10849
   



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions closed                                                                                                    |
+| `pulsar.client.consumer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions opened                                                                                                    |
+| `pulsar.client.consumer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions closed                                                                                                    |
+| `pulsar.client.consumer.message.received.count` | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages explicitly received by the consumer application                                                                    |
+| `pulsar.client.consumer.message.received.size`  | Counter       | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of bytes explicitly received by the consumer application                                                                       |
+| `pulsar.client.consumer.receive_queue.count`    | UpDownCounter | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages currently sitting in the consumer receive queue                                                                    |
+| `pulsar.client.consumer.receive_queue.size`     | UpDownCounter | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The total size in bytes of messages currently sitting in the consumer receive queue                                                       |
+| `pulsar.client.consumer.message.ack`            | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of acknowledged messages                                                                                                       |

Review Comment:
   ```suggestion
   | `pulsar.client.consumer.message.ack.count`            | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of acknowledged messages                                                                                                       |
   ```



##########
pip/pip-342 OTel client metrics support.md:
##########
@@ -0,0 +1,167 @@
+# PIP 342: Support OpenTelemetry metrics in Pulsar client
+
+## Motivation
+
+Current support for metric instrumentation in Pulsar client is very limited and poses a lot of
+issues for integrating the metrics into any telemetry system.
+
+We have 2 ways that metrics are exposed today:
+
+1. Printing logs every 1 minute: While this is ok as it comes out of the box, it's very hard for
+   any application to get the data or use it in any meaningful way.
+2. `producer.getStats()` or `consumer.getStats()`: Calling these methods will get access to
+   the rate of events in the last 1-minute interval. This is problematic because out of the
+   box the metrics are not collected anywhere. One would have to start its own thread to
+   periodically check these values and export them to some other system.
+
+Neither of these mechanism that we have today are sufficient to enable application to easily
+export the telemetry data of Pulsar client SDK.
+
+## Goal
+
+Provide a good way for applications to retrieve and analyze the usage of Pulsar client operation,
+in particular with respect to:
+
+1. Maximizing compatibility with existing telemetry systems
+2. Minimizing the effort required to export these metrics
+
+## Why OpenTelemetry?
+
+[OpenTelemetry](https://opentelemetry.io/) is quickly becoming the de-facto standard API for metric and
+tracing instrumentation. In fact, as part of [PIP-264](https://github.com/apache/pulsar/blob/master/pip/pip-264.md),
+we are already migrating the Pulsar server side metrics to use OpenTelemetry.
+
+For Pulsar client SDK, we need to provide a similar way for application builder to quickly integrate and
+export Pulsar metrics.
+
+### Why exposing OpenTelemetry directly in Pulsar API
+
+When deciding how to expose the metrics exporter configuration there are multiple options:
+
+1. Accept an `OpenTelemetry` object directly in Pulsar API
+2. Build a pluggable interface that describe all the Pulsar client SDK events and allow application to
+   provide an implementation, perhaps providing an OpenTelemetry included option.
+
+For this proposal, we are following the (1) option. Here are the reasons:
+
+1. In a way, OpenTelemetry can be compared to [SLF4J](https://www.slf4j.org/), in the sense that it provides an API
+   on top of which different vendor can build multiple implementations. Therefore, there is no need to create a new
+   Pulsar-specific interface
+2. OpenTelemetry has 2 main artifacts: API and SDK. For the context of Pulsar client, we will only depend on its
+   API. Applications that are going to use OpenTelemetry, will include the OTel SDK
+3. Providing a custom interface has several drawbacks:
+    1. Applications need to update their implementations every time a new metric is added in Pulsar SDK
+    2. The surface of this plugin API can become quite big when there are several metrics
+    3. If we imagine an application that uses multiple libraries, like Pulsar SDK, and each of these has its own
+       custom way to expose metrics, we can see the level of integration burden that is pushed to application
+       developers
+4. It will always be easy to use OpenTelemetry to collect the metrics and export them using a custom metrics API. There
+   are several examples of this in OpenTelemetry documentation.
+
+## Public API changes
+
+### Enabling OpenTelemetry
+
+When building a `PulsarClient` instance, it will be possible to pass an `OpenTelemetry` object:
+
+```java
+interface ClientBuilder {
+    // ...
+    ClientBuilder openTelemetry(io.opentelemetry.api.OpenTelemetry openTelemetry);
+}
+```
+
+The common usage for an application would be something like:
+
+```java
+// Creates a OpenTelemetry instance using environment variables to configure it
+OpenTelemetry otel = AutoConfiguredOpenTelemetrySdk.builder().build()
+        .getOpenTelemetrySdk();
+
+PulsarClient client = PulsarClient.builder()
+        .serviceUrl("pulsar://localhost:6650")
+        .build();
+
+// ....
+```
+
+Even without passing the `OpenTelemetry` instance to Pulsar client SDK, an application using the OpenTelemetry
+agent, will be able to instrument the Pulsar client automatically, because we default to use `GlobalOpenTelemetry.get()`. 
+
+### Deprecating the old stats methods
+
+The old way of collecting stats will be deprecated in phases:
+ 1. Pulsar 3.3 - Old metrics deprecated, still enabled by default
+ 2. Pulsar 3.4 - Old metrics disabled by default
+ 3. Pulsar 4.0 - Old metrics removed
+
+Methods to deprecate:
+
+```java
+interface ClientBuilder {
+    // ...
+    @Deprecated
+    ClientBuilder statsInterval(long statsInterval, TimeUnit unit);
+}
+
+interface Producer {
+    @Deprecated
+    ProducerStats getStats();
+}
+
+interface Consumer {
+    @Deprecated
+    ConsumerStats getStats();
+}
+```
+
+## Initial set of metrics to include
+
+Based on the experience of Pulsar Go client SDK metrics (
+see: https://github.com/apache/pulsar-client-go/blob/master/pulsar/internal/metrics.go),
+this is the proposed initial set of metrics to export.
+
+Additional metrics could be added later on, though it's better to start with the set of most important metrics
+and then evaluate any missing information.
+
+These metrics names and attributes will be considered "Experimental" for 3.3 release and might be subject to changes.
+The plan is to finalize all the namings in 4.0 LTS release.
+
+Attributes with `[name]` brackets will not be included by default, to avoid high cardinality metrics.
+
+| OTel metric name                                | Type          | Unit        | Attributes                                                                                                                                             | Description                                                                                                                               |
+|-------------------------------------------------|---------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
+| `pulsar.client.connection.opened`               | Counter       | connections |                                                                                                                                                        | The number of connections opened                                                                                                          |
+| `pulsar.client.connection.closed`               | Counter       | connections |                                                                                                                                                        | The number of connections closed                                                                                                          |
+| `pulsar.client.connection.failed`               | Counter       | connections |                                                                                                                                                        | The number of failed connection attempts                                                                                                  |
+| `pulsar.client.producer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions opened                                                                                                    |
+| `pulsar.client.producer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`]                                                                            | The number of producer sessions closed                                                                                                    |
+| `pulsar.client.consumer.opened`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions opened                                                                                                    |
+| `pulsar.client.consumer.closed`                 | Counter       | sessions    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of consumer sessions closed                                                                                                    |
+| `pulsar.client.consumer.message.received.count` | Counter       | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages explicitly received by the consumer application                                                                    |
+| `pulsar.client.consumer.message.received.size`  | Counter       | bytes       | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of bytes explicitly received by the consumer application                                                                       |
+| `pulsar.client.consumer.receive_queue.count`    | UpDownCounter | messages    | `pulsar.tenant`, `pulsar.namespace`, [`pulsar.topic`], [`pulsar.partition`], `pulsar.subscription`                                                     | The number of messages currently sitting in the consumer receive queue                                                                    |

Review Comment:
   Is it better to use `pulsar.client.consumer.receiver.queue.usage`? and we can also consider to expose `pulsar.client.consumer.receiver.queue.limit`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org