You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "syun64 (via GitHub)" <gi...@apache.org> on 2023/03/02 16:56:14 UTC

[GitHub] [airflow] syun64 opened a new pull request, #29881: Blocklist to disable specific metric tags or metric names

syun64 opened a new pull request, #29881:
URL: https://github.com/apache/airflow/pull/29881

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   closes: #29663
   ---
   
   This PR adds two additional configuration parameters:
   ```
   statsd_disabled_tags:
         description: |
           If you want to avoid sending all the available metrics tags to StatsD,
           you can configure a blocklist of prefixes (comma separated) to filter out metric tags 
           that start with the elements of the list (e.g: "job_id,run_id")
         version_added: 2.6.0
         type: string
         example: job_id,run_id,dag_id,task_id
         default: job_id,run_id
   
   statsd_invert_allow_list:
         description: |
           Set to True to invert the functionality of statsd_invert_allow_list to a blocklist
         version_added: 2.6.0
         type: boolean
         example: ~
         default: "False"
   ```
   
   **statsd_disabled_tags** allows users to define a blocklist of tag name prefixes to disable.
   
   **statsd_invert_allow_list** is a boolean parameter that converts the functionality of statsd_invert_allow_list from an allowlist to a blocklist.
   
   The reason we need the metric name blocklist on top of a blocklist just for metric tags is because there are legacy metric names that have high cardinality under the definition explored in #29663 like: [`local_task_job.task_exit.<job_id>.<dag_id>.<task_id>.<return_code>`](https://github.com/apache/airflow/blob/main/airflow/jobs/local_task_job.py#L292) that users might want to disable to reduce their metric storage costs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] hussein-awala commented on a diff in pull request #29881: Blocklist to disable specific metric tags or metric names

Posted by "hussein-awala (via GitHub)" <gi...@apache.org>.
hussein-awala commented on code in PR #29881:
URL: https://github.com/apache/airflow/pull/29881#discussion_r1125751437


##########
airflow/stats.py:
##########
@@ -291,10 +315,11 @@ def wrapper(self, stat=None, *args, tags=None, **kwargs):
         if self.influxdb_tags_enabled:
             if stat is not None and tags is not None:
                 for k, v in tags.items():
-                    if all((c not in [",", "="] for c in v + k)):
-                        stat += f",{k}={v}"
-                    else:
-                        log.error("Dropping invalid tag: %s=%s.", k, v)
+                    if self.metric_tags_validator.test(k):

Review Comment:
   Why do we need to check if the tag is filtered here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] syun64 commented on pull request #29881: Blocklist to disable specific metric tags or metric names

Posted by "syun64 (via GitHub)" <gi...@apache.org>.
syun64 commented on PR #29881:
URL: https://github.com/apache/airflow/pull/29881#issuecomment-1452202295

   @hussein-awala @potiuk may I ask for your review on this PR? Thank you both very much for your input!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] hussein-awala commented on a diff in pull request #29881: Blocklist to disable specific metric tags or metric names

Posted by "hussein-awala (via GitHub)" <gi...@apache.org>.
hussein-awala commented on code in PR #29881:
URL: https://github.com/apache/airflow/pull/29881#discussion_r1125753666


##########
airflow/stats.py:
##########
@@ -291,10 +315,11 @@ def wrapper(self, stat=None, *args, tags=None, **kwargs):
         if self.influxdb_tags_enabled:
             if stat is not None and tags is not None:
                 for k, v in tags.items():
-                    if all((c not in [",", "="] for c in v + k)):
-                        stat += f",{k}={v}"
-                    else:
-                        log.error("Dropping invalid tag: %s=%s.", k, v)
+                    if self.metric_tags_validator.test(k):

Review Comment:
   My bad, I thought that this filter:
   ```python
               tags_list = [
                   f"{key}:{value}" for key, value in tags.items() if self.metric_tags_validator.test(key)
               ]
   ```
   is used for the two clients datadog and influxdb, but just found that is just used for datadog, and `if self.metric_tags_validator.test(k):` is used for influxdb



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] syun64 commented on a diff in pull request #29881: Blocklist to disable specific metric tags or metric names

Posted by "syun64 (via GitHub)" <gi...@apache.org>.
syun64 commented on code in PR #29881:
URL: https://github.com/apache/airflow/pull/29881#discussion_r1125752998


##########
airflow/stats.py:
##########
@@ -291,10 +315,11 @@ def wrapper(self, stat=None, *args, tags=None, **kwargs):
         if self.influxdb_tags_enabled:
             if stat is not None and tags is not None:
                 for k, v in tags.items():
-                    if all((c not in [",", "="] for c in v + k)):
-                        stat += f",{k}={v}"
-                    else:
-                        log.error("Dropping invalid tag: %s=%s.", k, v)
+                    if self.metric_tags_validator.test(k):

Review Comment:
   @hussein-awala 
   
   Aren't we only concatenating metric tags that are allowed? (i.e. not disabled / not blocked)
   
   If it's test(k) returns True, we will want to publish that metric tag



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #29881: Blocklist to disable specific metric tags or metric names

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #29881:
URL: https://github.com/apache/airflow/pull/29881#issuecomment-1455059028

   Static checks need fixing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] syun64 commented on a diff in pull request #29881: Blocklist to disable specific metric tags or metric names

Posted by "syun64 (via GitHub)" <gi...@apache.org>.
syun64 commented on code in PR #29881:
URL: https://github.com/apache/airflow/pull/29881#discussion_r1125766372


##########
airflow/stats.py:
##########
@@ -291,10 +315,11 @@ def wrapper(self, stat=None, *args, tags=None, **kwargs):
         if self.influxdb_tags_enabled:
             if stat is not None and tags is not None:
                 for k, v in tags.items():
-                    if all((c not in [",", "="] for c in v + k)):
-                        stat += f",{k}={v}"
-                    else:
-                        log.error("Dropping invalid tag: %s=%s.", k, v)
+                    if self.metric_tags_validator.test(k):

Review Comment:
   Yes - influxdb integration just adds the tag key:value pairs to the metric name with a comma-delimited standard, instead of having a separate Statsd client implementation that takes tags as a separate parameter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #29881: Blocklist to disable specific metric tags or metric names

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #29881:
URL: https://github.com/apache/airflow/pull/29881#discussion_r1125639145


##########
airflow/config_templates/default_airflow.cfg:
##########
@@ -486,6 +486,12 @@ statsd_prefix = airflow
 # start with the elements of the list (e.g: "scheduler,executor,dagrun")
 statsd_allow_list =
 
+# If you want to avoid sending all the available metrics to StatsD,
+# you can configure a block list of prefixes (comma separated) to filter out metrics that
+# start with the elements of the list (e.g: "scheduler,executor,dagrun").
+# If statsd_allow_list and statsd_block_list are both configured, statsd_block_list is ignored

Review Comment:
   One small NIT. Probably we should have a warning in the logs if both are configured.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #29881: Blocklist to disable specific metric tags or metric names

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk merged PR #29881:
URL: https://github.com/apache/airflow/pull/29881


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] syun64 commented on pull request #29881: Blocklist to disable specific metric tags or metric names

Posted by "syun64 (via GitHub)" <gi...@apache.org>.
syun64 commented on PR #29881:
URL: https://github.com/apache/airflow/pull/29881#issuecomment-1454941871

   > I think it is very confusing as defined now. I think we should change "allow_list_validator" field into "list_validator" and have two implementations:
   > 
   > * AllowListValidator
   > * BlockListValidator
   > 
   > And let the user choose which validator to use (by name for example). The way where you invert meaning of a list is super confusing. IMHO
   
   I see what you are saying @potiuk . I've made your suggested changes in stats.py. I've also introduced a separate `statsd_block_list` config parameter that expects a comma delimited string instead of `statsd_invert_allow_list`. Made a note on the config comment to note that if both statsd_allow_list and statsd_block_list are set, then statsd_block_list will be ignored in favor of statsd_allow_list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] syun64 commented on pull request #29881: Blocklist to disable specific metric tags or metric names

Posted by "syun64 (via GitHub)" <gi...@apache.org>.
syun64 commented on PR #29881:
URL: https://github.com/apache/airflow/pull/29881#issuecomment-1455290714

   > Static checks need fixing.
   
   Fixed the checks and added the log based on your feedback! @potiuk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org