You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Matt Burgess (Jira)" <ji...@apache.org> on 2021/03/02 02:35:00 UTC
[jira] [Updated] (NIFI-4713) Datadog Metrics Alignment

     [ https://issues.apache.org/jira/browse/NIFI-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Burgess updated NIFI-4713:
-------------------------------
    Affects Version/s:     (was: 1.4.0)
               Status: Patch Available  (was: Open)

> Datadog Metrics Alignment
> -------------------------
>
>                 Key: NIFI-4713
>                 URL: https://issues.apache.org/jira/browse/NIFI-4713
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Robert Batts
>            Priority: Major
>              Labels: datadog, metrics
>
> Metrics that are being fed into Datadog from Nifi do not seem to align to the Nifi model. Therefore, I am proposing the following.
> # Change the metric names to work better with Datadog
> # Become more reliant on tagging
> # Allow custom tagging
> Currently, metrics are being sent to Datadog in the following format:
> <metricsPrefix>.<processorName/flow>.<metricName>
> However, Datadog is more of a reuse a metric name and filter via tagging system. So in Datadog, something with a metric name of <metricsPrefix>.<metricName> with a tag of <processorName> works better than one unique metric per processor (in an event where there is no processorName, exclude the tag instead of adding 'flow'). 
> Consider the way Datadog does Kafka. The metric kafka.consumer_lag represents the current lag of a topic (tag) for a given consumer_group (tag) over all partitions (tag). 
> For the same moment in time:
> kafka.consumer_lag = 5 <topic:a, consumer_group:nifi, partition:0>
> kafka.consumer_lag = 7 <topic:a, consumer_group:nifi, partition:1>
> kafka.consumer_lag = 22 <topic:a, consumer_group:python, partition:0>
> kafka.consumer_lag = 19 <topic:a, consumer_group:python, partition:1>
> kafka.consumer_lag = 2 <topic:b, consumer_group:nifi, partition:0>
> If I wanted to know what the current lag was for a given consumer_group on all topics, I would include those tags and then sum on the remaining records (which would be the across the partitions). 
> For the same moment in time:
> kafka.consumer_lag = 12 for topic:a and consumer_group:nifi
> kafka.consumer_lag = 2 for topic:b and consumer_group:nifi
> In a Nifi sense, this could allow you to (for example) have a tag that noted this was an aws-sqs pull and aggregate the average number of records being pulled across the entire system instead of on a single process.
> Additionally, there is room for custom tagging as well. For example: I want to be able to aggregate across all Nifi clusters I control. Setting the prefix unique for each cluster breaks this aggregation and might not allow me to filter properly later if I do not set a prefix. But, if custom tagging was allowed, I could set a tag for cluster_name:nifi-1 and then you could have all metrics aggregated but be able to filter down to that specific cluster for other operations. In my opinion, the easiest way to implement this would be to take all non-required attributes from the Datadog controller and use them as the custom tags (these attributes should be considered final/static when loaded). The attributes are already in Key=Value format, so it should be easy enough to switch them over to Key:Value formatting for tagging (once the required attributes are removed).
> (Most if not all work for this is centered on org.apache.nifi.reporting.datadog.DataDogReportingTask)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)