You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Noble Numbat <no...@gmail.com> on 2021/02/09 07:48:36 UTC

Adding metrics to Prometheus endpoint in the API

Hi everyone,

We have added metrics to the Prometheus metrics endpoint in the API
(/nifi-api/flow/metrics/prometheus) to improve programmatic access to
NiFi metrics for the purpose of monitoring. We’d like to contribute
these back to the project for the benefit of others. Please find the
list of metrics below.

Before I open a JIRA ticket and pull request, I have some questions to
clarify my understanding and determine what else I will need to add to
the code.
1. How are the use cases different between the Prometheus metrics
endpoint in the API (/nifi-api/flow/metrics/prometheus) and the
PrometheusReportingTask? I note that the metrics are almost identical
between the two.
2. Is the intent to keep the metrics in these two endpoints the same?
That is, if we add metrics to the Prometheus metrics endpoint in the
API, are we expected to add these to the PrometheusReportingTask as
well?
3. If so, one way to get the metrics data into
PrometheusReportingTask.java is to make an API call to
/nifi-api/controller/config. Is that an acceptable way to get metrics
data for max_event_driven_threads and max_timer_driven_threads?

For context, here are the metrics we’ve added;
nifi_repository_max_bytes{flowfile}
nifi_repository_max_bytes{content}
nifi_repository_max_bytes{provenance}
nifi_repository_used_bytes{flowfile}
nifi_repository_used_bytes{content}
nifi_repository_used_bytes{provenance}
jvm_deadlocked_thread_count
max_event_driven_threads
max_timer_driven_threads
jvm_heap_non_init
jvm_heap_non_committed
jvm_heap_non_max
jvm_heap_non_used
jvm_heap_committed
jvm_heap_init
jvm_heap_max

thanks

Re: Adding metrics to Prometheus endpoint in the API

Posted by Kevin Doran <kd...@gmail.com>.

Hello and thanks for your interest in contributing to NiFi.

If my understanding is correct, PrometheusReportingTask is an implementation of the ReportingTask interface that exposes data available in the ReportingContext and other internal interfaces on the Prometheus Meter Registry and ultimately the /metrics/prometheus endpoint.

Personally, I would be fine with adding additional metrics to the Prometheus metrics endpoint that are not part of a standard ReprotingTask, where it makes sense. I do not think ReportingTask implementations such as PrometheusReportingTask should access information via an http call to the NiFi REST API. Rather, we should expose the proper internal APIs and contextual data to make it available. 

The approaches I can think of are:
- extending the ReportingContext to add the desired fields
- adding to the Prometheus meter registry/endpoint metrics that are not available to other reporting tasks in the nidi-prometheus-reporting-task implementation.
- adding a controller service for getting a shared instance of the Prometheus Meter Registry in other components, and then adding other components that write to that registry data that is not part of the ReportingContext (and do not make sense to add to the ReportingContext, but would be nice to have via Prometheus scraping)

I hope this helps.
Kevin

> On Feb 9, 2021, at 2:48 AM, Noble Numbat <no...@gmail.com> wrote:
> 
> Hi everyone,
> 
> We have added metrics to the Prometheus metrics endpoint in the API
> (/nifi-api/flow/metrics/prometheus) to improve programmatic access to
> NiFi metrics for the purpose of monitoring. We’d like to contribute
> these back to the project for the benefit of others. Please find the
> list of metrics below.
> 
> Before I open a JIRA ticket and pull request, I have some questions to
> clarify my understanding and determine what else I will need to add to
> the code.
> 1. How are the use cases different between the Prometheus metrics
> endpoint in the API (/nifi-api/flow/metrics/prometheus) and the
> PrometheusReportingTask? I note that the metrics are almost identical
> between the two.
> 2. Is the intent to keep the metrics in these two endpoints the same?
> That is, if we add metrics to the Prometheus metrics endpoint in the
> API, are we expected to add these to the PrometheusReportingTask as
> well?
> 3. If so, one way to get the metrics data into
> PrometheusReportingTask.java is to make an API call to
> /nifi-api/controller/config. Is that an acceptable way to get metrics
> data for max_event_driven_threads and max_timer_driven_threads?
> 
> For context, here are the metrics we’ve added;
> nifi_repository_max_bytes{flowfile}
> nifi_repository_max_bytes{content}
> nifi_repository_max_bytes{provenance}
> nifi_repository_used_bytes{flowfile}
> nifi_repository_used_bytes{content}
> nifi_repository_used_bytes{provenance}
> jvm_deadlocked_thread_count
> max_event_driven_threads
> max_timer_driven_threads
> jvm_heap_non_init
> jvm_heap_non_committed
> jvm_heap_non_max
> jvm_heap_non_used
> jvm_heap_committed
> jvm_heap_init
> jvm_heap_max
> 
> thanks

Re: Adding metrics to Prometheus endpoint in the API

Posted by Kevin Doran <kd...@apache.org>.

Thanks for clarifying, Matt! 

Please ignore everything I said 😶

> On Feb 9, 2021, at 14:41, Matt Burgess <ma...@apache.org> wrote:
> 
> The PrometheusReportingTask and the NIFi API endpoint for Prometheus
> metrics are different beasts but they use quite a bit of the same code
> [1]. The intent is to report on the same metrics wherever possible,
> and I think for the most part we've done that. They don't call each
> other, instead they get their own copies of the metrics registries,
> and they populate them when triggered. For the REST endpoint, it's
> done on-demand. For the Reporting Task, it's done when scheduled. The
> Reporting Task came first to provide a way for Prometheus to scrape a
> NiFi instance. But as Reporting Tasks are system-level controller
> services, they don't get exported to templates, possibly require
> configuration after manual instantiation, etc. To that end the REST
> endpoint was added, as it gets all the security, configuration, and
> HTTP server "for free" so to speak. Also I think the "totals" metrics
> might be for the whole cluster where the Reporting Task might only be
> for the node, but I'm not positive.
> 
> For some of the metrics you added, aren't they constants based on
> properties or other settings? If so we probably didn't add them
> because it wasn't a useful metric on its own, but there is precedence
> for adding such static metrics for the purposes of downstream queries
> (used / max * 100% for example).
> 
> The other ones (besides repository info) were possibly just oversights
> but if they are helpful metrics, then please feel free to add them.
> You should find that you can update the appropriate Registry classes
> as well as PrometheusMetricsUtil in nifi-prometheus-utils, and if no
> new registries are added, I believe both the REST endpoint and the
> Reporting Task will both have the new metrics. If you do need to add a
> registry (NiFiRepositoryMetricsRegistry for example), you'd want to
> follow the same pattern as the others and make the call to
> PrometheusMetricsUtil.createNiFiRepositoryMetrics() from both the
> endpoint [2] and the reporting task [3].
> 
> Last thing I'll mention is that we're using Dropwizard for most of
> these metrics, currently at version 4.1.2 but the latest 4.1.x release
> is 4.1.17. We might consider an upgrade while adding these metrics;
> not much has been done in the metrics-jvm module since 4.1.2 but a
> couple of new metrics were added [4], we could expose those as well.
> 
> Regards,
> Matt
> 
> [1] https://github.com/apache/nifi/tree/main/nifi-nar-bundles/nifi-extension-utils/nifi-prometheus-utils/src/main/java/org/apache/nifi/prometheus/util
> [2] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/StandardNiFiServiceFacade.java#L5380
> [3] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-prometheus-bundle/nifi-prometheus-reporting-task/src/main/java/org/apache/nifi/reporting/prometheus/PrometheusReportingTask.java#L134
> [4] https://github.com/dropwizard/metrics/commit/ccc91ef1ade1975d58595f23caa48d5ed68a6b54#diff-42e4dfff08e984191adc05ecf744f324f7a9039f72e26bafcb779876584e9e7b
> 
> On Tue, Feb 9, 2021 at 2:49 AM Noble Numbat <no...@gmail.com> wrote:
>> 
>> Hi everyone,
>> 
>> We have added metrics to the Prometheus metrics endpoint in the API
>> (/nifi-api/flow/metrics/prometheus) to improve programmatic access to
>> NiFi metrics for the purpose of monitoring. We’d like to contribute
>> these back to the project for the benefit of others. Please find the
>> list of metrics below.
>> 
>> Before I open a JIRA ticket and pull request, I have some questions to
>> clarify my understanding and determine what else I will need to add to
>> the code.
>> 1. How are the use cases different between the Prometheus metrics
>> endpoint in the API (/nifi-api/flow/metrics/prometheus) and the
>> PrometheusReportingTask? I note that the metrics are almost identical
>> between the two.
>> 2. Is the intent to keep the metrics in these two endpoints the same?
>> That is, if we add metrics to the Prometheus metrics endpoint in the
>> API, are we expected to add these to the PrometheusReportingTask as
>> well?
>> 3. If so, one way to get the metrics data into
>> PrometheusReportingTask.java is to make an API call to
>> /nifi-api/controller/config. Is that an acceptable way to get metrics
>> data for max_event_driven_threads and max_timer_driven_threads?
>> 
>> For context, here are the metrics we’ve added;
>> nifi_repository_max_bytes{flowfile}
>> nifi_repository_max_bytes{content}
>> nifi_repository_max_bytes{provenance}
>> nifi_repository_used_bytes{flowfile}
>> nifi_repository_used_bytes{content}
>> nifi_repository_used_bytes{provenance}
>> jvm_deadlocked_thread_count
>> max_event_driven_threads
>> max_timer_driven_threads
>> jvm_heap_non_init
>> jvm_heap_non_committed
>> jvm_heap_non_max
>> jvm_heap_non_used
>> jvm_heap_committed
>> jvm_heap_init
>> jvm_heap_max
>> 
>> thanks

Re: Adding metrics to Prometheus endpoint in the API

Posted by Matt Burgess <ma...@apache.org>.

Noble,

We have to expose those via the interface vs trying to cast to some
expected object. For things like NIFI-8239 [1] we'll expose new
properties from an object the rest of the code can't get to (and
rightfully so), and I suspect the same would go for the properties of
FlowController that you're looking to expose. If NIFI-8239 isn't
capturing what you want, please feel free to add to that Jira or
create a New Feature case to describe what you're looking for.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-8239

On Fri, Feb 12, 2021 at 2:35 AM Noble Numbat <no...@gmail.com> wrote:
>
> Hi Matt
>
> Thanks for the clarification – I see your point. Both thread metrics
> [1] are constant and at this stage we are trying to get them in the
> right location.
>
> We have updated the appropriate Registry classes and
> PrometheusMetricsUtil and the additional JVM metrics are displaying in
> both the REST endpoint and the Reporting Task. The problem we
> encountered was that the data for the metrics for the threads [1] and
> the repositories [2] isn't available from the ReportingContext [3]
> interface and we are reluctant to add methods to this interface as it
> is also implemented by other classes that don’t contain the relevant
> fields.
>
> The actual object that is passed to the PrometheusReportingTask’s
> onTrigger method [4] is a StandardReportingContext [5] and this class
> has the FlowController field that contains the thread and repository
> information we need. I tried to cast the passed object to a
> StandardReportingContext object so I could access the information but
> got a java.lang.ClassCastException, which I believe is caused by
> classes loaded with different classLoaders. The stack trace is below
> [6]. I tried to resolve this by adding the file
> org.apache.nifi.controller.reporting.ReportingContext under the
> META-INF.services directory [7] to force the classLoader to load this
> class. The file contained the line
> org.apache.nifi.controller.reporting.StandardReportingContext. The
> ClassCastException didn’t change.
>
> How would you suggest accessing the maxTimerDrivenThreads,
> maxEventDrivenThreads, contentRepository, flowFileRepository and
> provenanceRepository fields that are available via the FlowController
> instance that is in StandardReportingContext when it is passed as a
> ReportingContext?
>
> thanks
>
> [1] max_timer_driven_threads and max_event_driven_threads
> [2] contentRepository, flowFileRepository and provenanceRepository
> [3] https://github.com/apache/nifi/blob/main/nifi-api/src/main/java/org/apache/nifi/reporting/ReportingContext.java
> [4] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-prometheus-bundle/nifi-prometheus-reporting-task/src/main/java/org/apache/nifi/reporting/prometheus/PrometheusReportingTask.java#L196
> [5] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/reporting/StandardReportingContext.java
> [6] 2021-02-12 11:15:29,434 ERROR [Timer-Driven Process Thread-3]
> o.a.n.r.p.PrometheusReportingTask
> PrometheusReportingTask[id=938b946e-0177-1000-221f-68f6ba8694a6] :
> java.lang.ClassCastException:
> org.apache.nifi.controller.reporting.StandardReportingContext cannot
> be cast to org.apache.nifi.controller.reporting.StandardReportingContext
> java.lang.ClassCastException:
> org.apache.nifi.controller.reporting.StandardReportingContext cannot
> be cast to org.apache.nifi.controller.reporting.StandardReportingContext
> at org.apache.nifi.reporting.prometheus.PrometheusReportingTask.onTrigger(PrometheusReportingTask.java:296)
> at org.apache.nifi.controller.tasks.ReportingTaskWrapper.run(ReportingTaskWrapper.java:44)
> at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> [7] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/resources/META-INF/services
>
>
>
>
> On Wed, 10 Feb 2021 at 06:42, Matt Burgess <ma...@apache.org> wrote:
> >
> > The PrometheusReportingTask and the NIFi API endpoint for Prometheus
> > metrics are different beasts but they use quite a bit of the same code
> > [1]. The intent is to report on the same metrics wherever possible,
> > and I think for the most part we've done that. They don't call each
> > other, instead they get their own copies of the metrics registries,
> > and they populate them when triggered. For the REST endpoint, it's
> > done on-demand. For the Reporting Task, it's done when scheduled. The
> > Reporting Task came first to provide a way for Prometheus to scrape a
> > NiFi instance. But as Reporting Tasks are system-level controller
> > services, they don't get exported to templates, possibly require
> > configuration after manual instantiation, etc. To that end the REST
> > endpoint was added, as it gets all the security, configuration, and
> > HTTP server "for free" so to speak. Also I think the "totals" metrics
> > might be for the whole cluster where the Reporting Task might only be
> > for the node, but I'm not positive.
> >
> > For some of the metrics you added, aren't they constants based on
> > properties or other settings? If so we probably didn't add them
> > because it wasn't a useful metric on its own, but there is precedence
> > for adding such static metrics for the purposes of downstream queries
> > (used / max * 100% for example).
> >
> > The other ones (besides repository info) were possibly just oversights
> > but if they are helpful metrics, then please feel free to add them.
> > You should find that you can update the appropriate Registry classes
> > as well as PrometheusMetricsUtil in nifi-prometheus-utils, and if no
> > new registries are added, I believe both the REST endpoint and the
> > Reporting Task will both have the new metrics. If you do need to add a
> > registry (NiFiRepositoryMetricsRegistry for example), you'd want to
> > follow the same pattern as the others and make the call to
> > PrometheusMetricsUtil.createNiFiRepositoryMetrics() from both the
> > endpoint [2] and the reporting task [3].
> >
> > Last thing I'll mention is that we're using Dropwizard for most of
> > these metrics, currently at version 4.1.2 but the latest 4.1.x release
> > is 4.1.17. We might consider an upgrade while adding these metrics;
> > not much has been done in the metrics-jvm module since 4.1.2 but a
> > couple of new metrics were added [4], we could expose those as well.
> >
> > Regards,
> > Matt
> >
> > [1] https://github.com/apache/nifi/tree/main/nifi-nar-bundles/nifi-extension-utils/nifi-prometheus-utils/src/main/java/org/apache/nifi/prometheus/util
> > [2] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/StandardNiFiServiceFacade.java#L5380
> > [3] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-prometheus-bundle/nifi-prometheus-reporting-task/src/main/java/org/apache/nifi/reporting/prometheus/PrometheusReportingTask.java#L134
> > [4] https://github.com/dropwizard/metrics/commit/ccc91ef1ade1975d58595f23caa48d5ed68a6b54#diff-42e4dfff08e984191adc05ecf744f324f7a9039f72e26bafcb779876584e9e7b
> >
> > On Tue, Feb 9, 2021 at 2:49 AM Noble Numbat <no...@gmail.com> wrote:
> > >
> > > Hi everyone,
> > >
> > > We have added metrics to the Prometheus metrics endpoint in the API
> > > (/nifi-api/flow/metrics/prometheus) to improve programmatic access to
> > > NiFi metrics for the purpose of monitoring. We’d like to contribute
> > > these back to the project for the benefit of others. Please find the
> > > list of metrics below.
> > >
> > > Before I open a JIRA ticket and pull request, I have some questions to
> > > clarify my understanding and determine what else I will need to add to
> > > the code.
> > > 1. How are the use cases different between the Prometheus metrics
> > > endpoint in the API (/nifi-api/flow/metrics/prometheus) and the
> > > PrometheusReportingTask? I note that the metrics are almost identical
> > > between the two.
> > > 2. Is the intent to keep the metrics in these two endpoints the same?
> > > That is, if we add metrics to the Prometheus metrics endpoint in the
> > > API, are we expected to add these to the PrometheusReportingTask as
> > > well?
> > > 3. If so, one way to get the metrics data into
> > > PrometheusReportingTask.java is to make an API call to
> > > /nifi-api/controller/config. Is that an acceptable way to get metrics
> > > data for max_event_driven_threads and max_timer_driven_threads?
> > >
> > > For context, here are the metrics we’ve added;
> > > nifi_repository_max_bytes{flowfile}
> > > nifi_repository_max_bytes{content}
> > > nifi_repository_max_bytes{provenance}
> > > nifi_repository_used_bytes{flowfile}
> > > nifi_repository_used_bytes{content}
> > > nifi_repository_used_bytes{provenance}
> > > jvm_deadlocked_thread_count
> > > max_event_driven_threads
> > > max_timer_driven_threads
> > > jvm_heap_non_init
> > > jvm_heap_non_committed
> > > jvm_heap_non_max
> > > jvm_heap_non_used
> > > jvm_heap_committed
> > > jvm_heap_init
> > > jvm_heap_max
> > >
> > > thanks

Re: Adding metrics to Prometheus endpoint in the API

Posted by Noble Numbat <no...@gmail.com>.

Hi Matt

Thanks for the clarification – I see your point. Both thread metrics
[1] are constant and at this stage we are trying to get them in the
right location.

We have updated the appropriate Registry classes and
PrometheusMetricsUtil and the additional JVM metrics are displaying in
both the REST endpoint and the Reporting Task. The problem we
encountered was that the data for the metrics for the threads [1] and
the repositories [2] isn't available from the ReportingContext [3]
interface and we are reluctant to add methods to this interface as it
is also implemented by other classes that don’t contain the relevant
fields.

The actual object that is passed to the PrometheusReportingTask’s
onTrigger method [4] is a StandardReportingContext [5] and this class
has the FlowController field that contains the thread and repository
information we need. I tried to cast the passed object to a
StandardReportingContext object so I could access the information but
got a java.lang.ClassCastException, which I believe is caused by
classes loaded with different classLoaders. The stack trace is below
[6]. I tried to resolve this by adding the file
org.apache.nifi.controller.reporting.ReportingContext under the
META-INF.services directory [7] to force the classLoader to load this
class. The file contained the line
org.apache.nifi.controller.reporting.StandardReportingContext. The
ClassCastException didn’t change.

How would you suggest accessing the maxTimerDrivenThreads,
maxEventDrivenThreads, contentRepository, flowFileRepository and
provenanceRepository fields that are available via the FlowController
instance that is in StandardReportingContext when it is passed as a
ReportingContext?

thanks

[1] max_timer_driven_threads and max_event_driven_threads
[2] contentRepository, flowFileRepository and provenanceRepository
[3] https://github.com/apache/nifi/blob/main/nifi-api/src/main/java/org/apache/nifi/reporting/ReportingContext.java
[4] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-prometheus-bundle/nifi-prometheus-reporting-task/src/main/java/org/apache/nifi/reporting/prometheus/PrometheusReportingTask.java#L196
[5] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/reporting/StandardReportingContext.java
[6] 2021-02-12 11:15:29,434 ERROR [Timer-Driven Process Thread-3]
o.a.n.r.p.PrometheusReportingTask
PrometheusReportingTask[id=938b946e-0177-1000-221f-68f6ba8694a6] :
java.lang.ClassCastException:
org.apache.nifi.controller.reporting.StandardReportingContext cannot
be cast to org.apache.nifi.controller.reporting.StandardReportingContext
java.lang.ClassCastException:
org.apache.nifi.controller.reporting.StandardReportingContext cannot
be cast to org.apache.nifi.controller.reporting.StandardReportingContext
at org.apache.nifi.reporting.prometheus.PrometheusReportingTask.onTrigger(PrometheusReportingTask.java:296)
at org.apache.nifi.controller.tasks.ReportingTaskWrapper.run(ReportingTaskWrapper.java:44)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[7] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/resources/META-INF/services




On Wed, 10 Feb 2021 at 06:42, Matt Burgess <ma...@apache.org> wrote:
>
> The PrometheusReportingTask and the NIFi API endpoint for Prometheus
> metrics are different beasts but they use quite a bit of the same code
> [1]. The intent is to report on the same metrics wherever possible,
> and I think for the most part we've done that. They don't call each
> other, instead they get their own copies of the metrics registries,
> and they populate them when triggered. For the REST endpoint, it's
> done on-demand. For the Reporting Task, it's done when scheduled. The
> Reporting Task came first to provide a way for Prometheus to scrape a
> NiFi instance. But as Reporting Tasks are system-level controller
> services, they don't get exported to templates, possibly require
> configuration after manual instantiation, etc. To that end the REST
> endpoint was added, as it gets all the security, configuration, and
> HTTP server "for free" so to speak. Also I think the "totals" metrics
> might be for the whole cluster where the Reporting Task might only be
> for the node, but I'm not positive.
>
> For some of the metrics you added, aren't they constants based on
> properties or other settings? If so we probably didn't add them
> because it wasn't a useful metric on its own, but there is precedence
> for adding such static metrics for the purposes of downstream queries
> (used / max * 100% for example).
>
> The other ones (besides repository info) were possibly just oversights
> but if they are helpful metrics, then please feel free to add them.
> You should find that you can update the appropriate Registry classes
> as well as PrometheusMetricsUtil in nifi-prometheus-utils, and if no
> new registries are added, I believe both the REST endpoint and the
> Reporting Task will both have the new metrics. If you do need to add a
> registry (NiFiRepositoryMetricsRegistry for example), you'd want to
> follow the same pattern as the others and make the call to
> PrometheusMetricsUtil.createNiFiRepositoryMetrics() from both the
> endpoint [2] and the reporting task [3].
>
> Last thing I'll mention is that we're using Dropwizard for most of
> these metrics, currently at version 4.1.2 but the latest 4.1.x release
> is 4.1.17. We might consider an upgrade while adding these metrics;
> not much has been done in the metrics-jvm module since 4.1.2 but a
> couple of new metrics were added [4], we could expose those as well.
>
> Regards,
> Matt
>
> [1] https://github.com/apache/nifi/tree/main/nifi-nar-bundles/nifi-extension-utils/nifi-prometheus-utils/src/main/java/org/apache/nifi/prometheus/util
> [2] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/StandardNiFiServiceFacade.java#L5380
> [3] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-prometheus-bundle/nifi-prometheus-reporting-task/src/main/java/org/apache/nifi/reporting/prometheus/PrometheusReportingTask.java#L134
> [4] https://github.com/dropwizard/metrics/commit/ccc91ef1ade1975d58595f23caa48d5ed68a6b54#diff-42e4dfff08e984191adc05ecf744f324f7a9039f72e26bafcb779876584e9e7b
>
> On Tue, Feb 9, 2021 at 2:49 AM Noble Numbat <no...@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > We have added metrics to the Prometheus metrics endpoint in the API
> > (/nifi-api/flow/metrics/prometheus) to improve programmatic access to
> > NiFi metrics for the purpose of monitoring. We’d like to contribute
> > these back to the project for the benefit of others. Please find the
> > list of metrics below.
> >
> > Before I open a JIRA ticket and pull request, I have some questions to
> > clarify my understanding and determine what else I will need to add to
> > the code.
> > 1. How are the use cases different between the Prometheus metrics
> > endpoint in the API (/nifi-api/flow/metrics/prometheus) and the
> > PrometheusReportingTask? I note that the metrics are almost identical
> > between the two.
> > 2. Is the intent to keep the metrics in these two endpoints the same?
> > That is, if we add metrics to the Prometheus metrics endpoint in the
> > API, are we expected to add these to the PrometheusReportingTask as
> > well?
> > 3. If so, one way to get the metrics data into
> > PrometheusReportingTask.java is to make an API call to
> > /nifi-api/controller/config. Is that an acceptable way to get metrics
> > data for max_event_driven_threads and max_timer_driven_threads?
> >
> > For context, here are the metrics we’ve added;
> > nifi_repository_max_bytes{flowfile}
> > nifi_repository_max_bytes{content}
> > nifi_repository_max_bytes{provenance}
> > nifi_repository_used_bytes{flowfile}
> > nifi_repository_used_bytes{content}
> > nifi_repository_used_bytes{provenance}
> > jvm_deadlocked_thread_count
> > max_event_driven_threads
> > max_timer_driven_threads
> > jvm_heap_non_init
> > jvm_heap_non_committed
> > jvm_heap_non_max
> > jvm_heap_non_used
> > jvm_heap_committed
> > jvm_heap_init
> > jvm_heap_max
> >
> > thanks

Re: Adding metrics to Prometheus endpoint in the API

Posted by Matt Burgess <ma...@apache.org>.

The PrometheusReportingTask and the NIFi API endpoint for Prometheus
metrics are different beasts but they use quite a bit of the same code
[1]. The intent is to report on the same metrics wherever possible,
and I think for the most part we've done that. They don't call each
other, instead they get their own copies of the metrics registries,
and they populate them when triggered. For the REST endpoint, it's
done on-demand. For the Reporting Task, it's done when scheduled. The
Reporting Task came first to provide a way for Prometheus to scrape a
NiFi instance. But as Reporting Tasks are system-level controller
services, they don't get exported to templates, possibly require
configuration after manual instantiation, etc. To that end the REST
endpoint was added, as it gets all the security, configuration, and
HTTP server "for free" so to speak. Also I think the "totals" metrics
might be for the whole cluster where the Reporting Task might only be
for the node, but I'm not positive.

For some of the metrics you added, aren't they constants based on
properties or other settings? If so we probably didn't add them
because it wasn't a useful metric on its own, but there is precedence
for adding such static metrics for the purposes of downstream queries
(used / max * 100% for example).

The other ones (besides repository info) were possibly just oversights
but if they are helpful metrics, then please feel free to add them.
You should find that you can update the appropriate Registry classes
as well as PrometheusMetricsUtil in nifi-prometheus-utils, and if no
new registries are added, I believe both the REST endpoint and the
Reporting Task will both have the new metrics. If you do need to add a
registry (NiFiRepositoryMetricsRegistry for example), you'd want to
follow the same pattern as the others and make the call to
PrometheusMetricsUtil.createNiFiRepositoryMetrics() from both the
endpoint [2] and the reporting task [3].

Last thing I'll mention is that we're using Dropwizard for most of
these metrics, currently at version 4.1.2 but the latest 4.1.x release
is 4.1.17. We might consider an upgrade while adding these metrics;
not much has been done in the metrics-jvm module since 4.1.2 but a
couple of new metrics were added [4], we could expose those as well.

Regards,
Matt

[1] https://github.com/apache/nifi/tree/main/nifi-nar-bundles/nifi-extension-utils/nifi-prometheus-utils/src/main/java/org/apache/nifi/prometheus/util
[2] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/StandardNiFiServiceFacade.java#L5380
[3] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-prometheus-bundle/nifi-prometheus-reporting-task/src/main/java/org/apache/nifi/reporting/prometheus/PrometheusReportingTask.java#L134
[4] https://github.com/dropwizard/metrics/commit/ccc91ef1ade1975d58595f23caa48d5ed68a6b54#diff-42e4dfff08e984191adc05ecf744f324f7a9039f72e26bafcb779876584e9e7b

On Tue, Feb 9, 2021 at 2:49 AM Noble Numbat <no...@gmail.com> wrote:
>
> Hi everyone,
>
> We have added metrics to the Prometheus metrics endpoint in the API
> (/nifi-api/flow/metrics/prometheus) to improve programmatic access to
> NiFi metrics for the purpose of monitoring. We’d like to contribute
> these back to the project for the benefit of others. Please find the
> list of metrics below.
>
> Before I open a JIRA ticket and pull request, I have some questions to
> clarify my understanding and determine what else I will need to add to
> the code.
> 1. How are the use cases different between the Prometheus metrics
> endpoint in the API (/nifi-api/flow/metrics/prometheus) and the
> PrometheusReportingTask? I note that the metrics are almost identical
> between the two.
> 2. Is the intent to keep the metrics in these two endpoints the same?
> That is, if we add metrics to the Prometheus metrics endpoint in the
> API, are we expected to add these to the PrometheusReportingTask as
> well?
> 3. If so, one way to get the metrics data into
> PrometheusReportingTask.java is to make an API call to
> /nifi-api/controller/config. Is that an acceptable way to get metrics
> data for max_event_driven_threads and max_timer_driven_threads?
>
> For context, here are the metrics we’ve added;
> nifi_repository_max_bytes{flowfile}
> nifi_repository_max_bytes{content}
> nifi_repository_max_bytes{provenance}
> nifi_repository_used_bytes{flowfile}
> nifi_repository_used_bytes{content}
> nifi_repository_used_bytes{provenance}
> jvm_deadlocked_thread_count
> max_event_driven_threads
> max_timer_driven_threads
> jvm_heap_non_init
> jvm_heap_non_committed
> jvm_heap_non_max
> jvm_heap_non_used
> jvm_heap_committed
> jvm_heap_init
> jvm_heap_max
>
> thanks