You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Corey Fritz (JIRA)" <ji...@apache.org> on 2018/08/25 00:53:00 UTC
[jira] [Comment Edited] (NIFI-5535) DataDogReportingTask is not tagging metrics properly

    [ https://issues.apache.org/jira/browse/NIFI-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592352#comment-16592352 ] 

Corey Fritz edited comment on NIFI-5535 at 8/25/18 12:52 AM:
-------------------------------------------------------------

So I attempted to fix the tagging issue, which I actually did, but that then just exacerbated another problem. The DataDogReportingTask is just sending way too many metrics, with way too many tags. Each processor generates 6 metrics with 2 tags each. Each port generates 9 metrics with 5 tags each. Each connection generates 6 metrics with 8 tags each. Plus 10 aggregated flow-level metrics and 13 JVM metrics, each with 2 tags. Datadog considers each unique combination of a metric name + tag to be a "custom metric". The lowest plan with Datadog allows an average of 100 "custom metrics" per host (meaning some could have more, some could have less, as long as the total # of custom metrics works out to be 100/host).

I have a flow with about 30 processors that resulted in 370 metrics, and I didn't bother to figure out how many tags, being sent to Datadog. I noticed that some of the metrics I was actually interested in monitoring were not showing up in Datadog, and I'm sure it's because we're way over our limit. There should probably be an opt-in strategy for identifying which sets of metrics we want to send to Datadog.

So... my proposal is this (and I'm willing to tackle this as time allows):

1. Add an _Enable Monitoring_ property to all processors that is off by default

2. Add an _Enable Monitoring_ property to all ports that is off by default

3. Add an _Enable Monitoring_ property to all connections that is off by default

4. Add the following properties to the DataDogReportingTask
 * _Enable Flow-level Monitoring_, off by default
 * _Enable JVM Monitoring_, off by default

5. Update the DataDogReportingTask to only submit metrics for components that have had monitoring explicitly enabled

6. Update the DataDogReportingTask to remove all metric tags except for _Environment_. I just don't see much value in any of the other tags.

This seems like a pretty large refactoring with a wide scope since it would touch processors, ports, and connections, as well as the other metric reporting services, so I'd like to discuss further with someone before proceeding.


was (Author: snagafritz):
So I attempted to fix the tagging issue, which I actually did, but that then just exacerbated another problem. The DataDogReportingTask is just sending way too many metrics, with way too many tags. Each processor generates 6 metrics with 2 tags each. Each port generates 9 metrics with 5 tags each. Each connection generates 6 metrics with 8 tags each. Plus 10 aggregated flow-level metrics and 13 JVM metrics, each with 2 tags. Datadog considers each unique combination of a metric name + tag to be a "custom metric". The lowest plan with Datadog allows an average of 100 "custom metrics" per host (meaning some could have more, some could have less, as long as the total # of custom metrics works out to be 100/host).

I have a flow with about 30 processors that resulted in 370 metrics, and I didn't bother to figure out how many tags, being sent to Datadog. I noticed that some of the metrics I was actually interested in monitoring were not showing up in Datadog, and I'm sure it's because we're way over our limit. There should probably be an opt-in strategy for identifying which sets of metrics we want to send to Datadog.

So... my proposal is this (and I'm willing to tackle this as time allows):

1. Add an _Enable Monitoring_ property to all processors that is off by default

2. Add an _Enable Monitoring_ property to all ports that is off by default

3. Add an _Enable Monitoring_ property to all connections that is off by default

4. Add the following properties to the DataDogReportingTask
 * _Enable Flow-level Monitoring_, off by default
 * _Enable JVM Monitoring_, off by default

5. Update the DataDogReportingTask to only submit metrics for components that have had monitoring explicitly enabled

6. Update the DataDogReportingTask to remove all metric tags except for _Environment_. I just don't see much value in any of the other tags.

This seems like a pretty large refactoring with a wide scope since it would touch processors, ports, and connections, so I'd like to discuss further with someone before proceeding.

> DataDogReportingTask is not tagging metrics properly
> ----------------------------------------------------
>
>                 Key: NIFI-5535
>                 URL: https://issues.apache.org/jira/browse/NIFI-5535
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: 1.7.1
>            Reporter: Corey Fritz
>            Priority: Major
>         Attachments: Screen Shot 2018-08-19 at 12.33.58 AM.png
>
>
> The current (and looks like original) implementation of the DataDogReportingTask is not applying metric tags correctly, and as a result, the "Environment" configuration property on that task does not work. This means that you're not going to be able to use tags to differentiate the metric values coming from different environments.
> Currently, every metric reported by this task gets the same set of tags applied:
> {code:java}
> connection-destination-id
> connection-destination-name
> connection-group-id
> connection-id
> connection-name
> connection-source-id
> connection-source-name
> dataflow_id
> env
> port-group-id
> port-id
> port-name{code}
> This list is defined here: [https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-datadog-bundle/nifi-datadog-reporting-task/src/main/java/org/apache/nifi/reporting/datadog/metrics/MetricsService.java#L111-L126]
> I've attached a screenshot from Datadog demonstrating a JVM metric with all of these tags applied.
> Each of these tags should include a value, i.e. "env:dev" instead of just "env".
> Other observations:
>  * it doesn't make sense to attach the _connection-_ and _port-_ tags to JVM metrics
>  * I'm not sure I see any value in the _dataflow_id_ tag
> I was hoping for a quick fix when I noticed the environment tagging wasn't working, but after reviewing the code I think a not insignificant refactoring will be required. I'll try to tackle this if/when time allows.
> See here for more context on Datadog tagging: [https://docs.datadoghq.com/tagging]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)