You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "sa1 (via GitHub)" <gi...@apache.org> on 2023/09/16 05:31:41 UTC

[GitHub] [airflow] sa1 opened a new issue, #34416: Airflow DAG fails to run if `dag_id` + `task_id` is too long with OTEL integration enabled.

sa1 opened a new issue, #34416:
URL: https://github.com/apache/airflow/issues/34416

   ### Apache Airflow version
   
   2.7.1
   
   ### What happened
   
   Airflow DAG fails to run if `dag_id` is too long. The following exception is raised and logged in worker logs.:
   
   ```Failed to execute task Invalid stat name: dev-cad.dag.datahub_config_deployment.viper_entrypoint.queued_duration. Please see https://opentelemetry.io/docs/reference/specification/metrics/api/#instrument-name-syntax.```
   
   There is no visible logs in airflow UI which would indicate the problem.
   
   The [metrics documentation](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html#setup-opentelemetry) claims that `stat_name_handler` can be used to rename stat names, which might workaround this issue, but seems like the otel integration doesn't use this handler, only `statsd` and `datadog` integration does.
   
   ### What you think should happen instead
   
   The dag_id/task_id combination is obviously too long to be sent to otel as a metric name (which has a max limit of just 63 characters), but the DAG itself should not fail in this case.
   
   There is a bunch of metrics that are excluded from the length check here, but seems like `queued_duration` is not a part of it, so DAG run fails before even starting.
   https://github.com/apache/airflow/blob/35699acbf447ce190107665d0145f1bf63df5a92/airflow/metrics/validators.py#L57
   
   
   
   ### How to reproduce
   
   Enable OTEL integration and with prefix as `dev-cad` and `dag_id` as `datahub_config_deployment` and `task_id` as `viper_entrypoint` , trigger a new DAG. The first task and subsequently all the rest of the DAG fails.
   
   
   
   ### Operating System
   
   Ubuntu 22.04.3 LTS
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==8.6.0
   apache-airflow-providers-celery==3.3.3
   apache-airflow-providers-common-sql==1.7.1
   apache-airflow-providers-ftp==3.5.1
   apache-airflow-providers-http==4.5.1
   apache-airflow-providers-imap==3.3.1
   apache-airflow-providers-openlineage==1.0.2
   apache-airflow-providers-postgres==5.6.0
   apache-airflow-providers-redis==3.3.1
   apache-airflow-providers-slack==8.0.0
   apache-airflow-providers-snowflake==5.0.0
   apache-airflow-providers-sqlite==3.4.3
   apache-airflow-providers-ssh==3.7.2
   
   
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   Docker based custom deployment on ECS Fargate.
   Separate fargate tasks for webserver, worker, scheduler and triggerer.
   
   
   ### Anything else
   
   Along with https://github.com/apache/airflow/issues/34405, these are issues where OTEL exceptions are leading to the failure of airflow DAGs.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] sa1 commented on issue #34416: Airflow DAG fails to run if `dag_id` + `task_id` is too long with OTEL integration enabled.

Posted by "sa1 (via GitHub)" <gi...@apache.org>.
sa1 commented on issue #34416:
URL: https://github.com/apache/airflow/issues/34416#issuecomment-1729903387

   I'll try implementing the temporary fix today during the contributor's workshop at Airflow summit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] sa1 commented on issue #34416: Airflow DAG fails to run if `dag_id` + `task_id` is too long with OTEL integration enabled.

Posted by "sa1 (via GitHub)" <gi...@apache.org>.
sa1 commented on issue #34416:
URL: https://github.com/apache/airflow/issues/34416#issuecomment-1722280665

   I think truncation is already happening further down the code over here:
   https://github.com/apache/airflow/blob/35699acbf447ce190107665d0145f1bf63df5a92/airflow/metrics/validators.py#L158
   
   But metrics that are not in the exemption list triggered the exception before reaching that point. As the comment says, we should be careful about introducing new exemptions.
   https://github.com/apache/airflow/blob/35699acbf447ce190107665d0145f1bf63df5a92/airflow/metrics/validators.py#L51
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on issue #34416: Airflow DAG fails to run if `dag_id` + `task_id` is too long with OTEL integration enabled.

Posted by "ferruzzi (via GitHub)" <gi...@apache.org>.
ferruzzi commented on issue #34416:
URL: https://github.com/apache/airflow/issues/34416#issuecomment-1731479336

   Yeah, this is a known issue and the reason for that exemption list and truncation.  We can't just rename all of the metrics because that would break back-compat, and that's why many of them are emitted twice (once with everything embedded in the name and once with tags) 
   
   It looks like those three metrics you called out were added after the change and SHOULD have been implemented with tags instead (and therefor should not have been added to the exemption list... but we didn't catch that in time so I guess it's the best answer)
   
   The unit test only makes sure the exemption list isn't changed, it doesn't check for new metrics which might break... maybe some kind of CI test would be wise, to prevent future new metrics from being added which have both `dag_id` and `task_id` in their name... I don't know what that would look like though.... we'd maybe have to parse the raw text of the changes looking for lines starting with `Stats` and including `dag_id` and `task_id` which do not match the exemptions list pattern?  maybe?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Airflow DAG fails to run if `dag_id` + `task_id` is too long with OTEL integration enabled. [airflow]

Posted by "ferruzzi (via GitHub)" <gi...@apache.org>.
ferruzzi commented on issue #34416:
URL: https://github.com/apache/airflow/issues/34416#issuecomment-1937867258

   This was addressed in https://github.com/apache/airflow/pull/34531; closing.  If it is still an issue, feel free to reopen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] sa1 commented on issue #34416: Airflow DAG fails to run if `dag_id` + `task_id` is too long with OTEL integration enabled.

Posted by "sa1 (via GitHub)" <gi...@apache.org>.
sa1 commented on issue #34416:
URL: https://github.com/apache/airflow/issues/34416#issuecomment-1722143440

   It seems that in my case, the task_id of the first task in the DAG is enough to trigger this exception, so the entire DAG failed, but presumably only a task would fail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] hussein-awala commented on issue #34416: Airflow DAG fails to run if `dag_id` + `task_id` is too long with OTEL integration enabled.

Posted by "hussein-awala (via GitHub)" <gi...@apache.org>.
hussein-awala commented on issue #34416:
URL: https://github.com/apache/airflow/issues/34416#issuecomment-1722264867

   > The dag_id/task_id combination is obviously too long to be sent to otel as a metric name (which has a max limit of just 63 characters), but the DAG itself should not fail in this case.
   
   Let's keep discussing the dag failure/no failure after OTEL failure in #34405 to avoid discuss that twice
   
   For the metric name limit, we have the same limitation in K8S resources, and we fix that by truncate the name and take only the first X characters, we can do the same thing with OTEL metrics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Airflow DAG fails to run if `dag_id` + `task_id` is too long with OTEL integration enabled. [airflow]

Posted by "ferruzzi (via GitHub)" <gi...@apache.org>.
ferruzzi closed issue #34416: Airflow DAG fails to run if `dag_id` + `task_id` is too long with OTEL integration enabled.
URL: https://github.com/apache/airflow/issues/34416


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org