You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "syun64 (via GitHub)" <gi...@apache.org> on 2023/02/15 20:29:58 UTC

[GitHub] [airflow] syun64 commented on pull request #28961: Emit DataDog statsd metrics with metadata tags

syun64 commented on PR #28961:
URL: https://github.com/apache/airflow/pull/28961#issuecomment-1431989563

   @potiuk @hussein-awala @uranusjr 
   
   I've started working with backend folks to add the new metric tags to the backend to be able to read the soon to be published metrics... I was reminded that cardinality of the metrics is an issue when it comes to the storage space and the retention period of the tags. I'm not sure of the other infrastructures, but for us, the cardinality of a metric is measured as:
   
   ```
   number of unique metric names * number of unique application tag pairs
   ```
   
   Introducing tags to existing metric names that already have these values concatenated into the metric names doesn't actually increase the cardinality by a lot (it only doubles from duplication of metrics on same events). But as a rule of thumb I think we might benefit from carefully analyzing the potential for cardinality explosion from each new tags.
   
   As an example, my only concern with this PR is the new tag attribute 'run_id' which is unique for every single dag_run, and hence increases the cardinality by the number of unique scheduled dag_runs during a retention period.
   
   This means that for an Airflow instance with 1000 daily jobs, with a metric retention period of 10 days, we are increasing the cardinality of our metrics by 10,000 just by adding this tag alone. As a benchmark, our allocated quota for metric cardinality is 100,000 per tenancy, and I'm wondering if other tag users may face similar storage-based concerns as well.
   
   Could I get your thoughts on this? Is there room to discuss and potentially backtrack the addition of run_id as a metric in the upcoming release?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org