You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/02 13:37:50 UTC

[GitHub] [airflow] potiuk opened a new issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

potiuk opened a new issue #19972:
URL: https://github.com/apache/airflow/issues/19972


   Includes: 
   
   * adding airflow configuration
   * starting airlfow and gathering metrics
   * integration with monitoring software (Prometheus/Graphana/Others?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-1008335582


   @cherusk  We should completely separate #11549 from this discussion.
   
   It has nothing to do with the instrumentation/OT integration that we are working on with @Melodie97 .
   The #11549 is completely different issue (and quite misleading in the context of opentelemetry integration). 
   
   What is described in #11549 is a dedicated, Prometheus operator that will an existing push gatway setup outside of Airlfow and allow "DAG writers" to push any metrics they want.
   
   What we are describing here is to provide an open-telemetry integration with Airlfow as a platform. The OT integration we want to add here will (eventually) provide a way for the "admin" of Airflow to set it up in the way that will provide various useful metrics of Airflow plattform as a whole, statistics of dag execution etc. 
   
   So both the scope and the audience of those two are different.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] subkanthi commented on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
subkanthi commented on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986844568


   > Also @Melodie97 - see thee #20053 - it turned out that the idea of @subkanthi was about adding a provider to expose PushGateway API via provider, so it is indeed different from our proposal.
   > 
   > In short what @subkanthi proposed is to have a separate Prometheus Provider with a Hook/Operator that will be available to the users of airflow to be used as something like that in the Dag (conceptually):
   > 
   > ```
   > @task
   > def run_biguery_job():
   >      hook_bq = BigQueryHook()
   >      bq_result = hook_bg.run_the_job()
   >      hook_prometheus = PrometheusHook()
   >      bq_job_metrics = get_metrics_for_bq_job(bq_result)
   >      hook_prometheus.push_metrics_to_gateway(bq_job_metrics)
   > ```
   > 
   > Is that the right "assesment" @subkanthi ?
   
   Yes thats correct @potiuk.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986073740


   In the long term, this one (whole project) is to see if we can get rid of statsd/datadog entirely and only use open-telemetry. 
   Once we have OpenTelemetry support we should not need specific prometheus integration. 
   
   We can use https://www.npmjs.com/package/@opentelemetry/exporter-prometheus to export all telemetry to Prometheus. 
   
   Why would we want separate Prometheus-specific implementation if our goal is to be monitoring-system agnostic (and this is what Open Telemetry promise is basically) ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986073740


   In the long term, this one (whole project) is to see if we can get rid of statsd/datadog entirely and only use open-telemetry. 
   Once we have OpenTelemetry support we should not need specific prometheus integration. 
   
   We can use https://open-telemetry.github.io/opentelemetry-python/exporter/prometheus/prometheus.html to export all telemetry to Prometheus. 
   
   Why would we want separate Prometheus-specific implementation if our goal is to be monitoring-system agnostic (and this is what Open Telemetry promise is basically) ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] subkanthi commented on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
subkanthi commented on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986071051


   I was thinking of also supporting prometheus, we could push to prometheus pushgateway, even though its not recommended. But it will be minimum changes.
   
   https://prometheus.io/docs/practices/pushing/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986090147


   Whatever we come up now - we will have to support with backwards compatibility promise in the future. I am pretty sure we do not want a short-term prometeus support now if we have a long term plan to support it differently. 
   
   Currently there are statsd exporters people can use (and do use) to export statsd to prometheus. In the future we will have (if the POC will work and AIP accepted) we will have opentelemetry-exporter that people would use (And we will support open-telemetnry and statsd for backwards compatibility).
   
   What purpose woudl it serve to add yet another option to Airflow - one that again would have to be deprecated and dropped (especially that there is a working method to get the metrics in Prometheus via exporter) ? 
   
   What problem would it solve?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] subkanthi commented on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
subkanthi commented on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986076483


   Definitely not disagreeing with the long term goal of supporting opentelemetry, I was just suggesting a short term initiative of supporting Prometheus with our current setup , minimum changes to stats class and keeping the architecture of pushing metrics.
   Counter , gauge concepts are the same with Prometheus and statsd 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] cherusk commented on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
cherusk commented on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-1008323944


   
   Pardon, but from the formulation of the code examples and the overall issue, I'm not certain will the eventual "package" contain a prometheus operator that can reach to prometheus API directly. Something along the lines of a wrapper around SimpleHttpOperator as a minimum viable technical approach I'd expect.
   
   I don't know why this needs to be confined to the prometheus push gateway which is a very specific interaction with prometheus.
   
   Thanks for clarifying!
   
   Relevant for:
   https://github.com/apache/airflow/issues/11549


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986830720


   Also @Melodie97 - see thee #20053 - it turned out that the idea of @subkanthi was about adding a provider to expose PushGateway API via provider, so it is indeed different from our proposal.
   
   In short what @subkanthi  proposed is to have a separate Prometheus Provider with a Hook/Operator that will be available to the users of airflow to be used as  something like that in the Dag (conceptually):
   
   ```
   @task
   def run_biguery_job():
        hook_bq = BigQueryHook()
        bq_result = hook_bg.run_the_job()
        hook_prometheus = PrometheusHook()
        bq_job_metrics = get_metrics_for_bq_job(bq_result)
        hook_prometheus.push_metrics_to_gateway(bq_job_metrics)
   ```
   
   Is that the right "assesment" @subkanthi ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-1008335582


   @cherusk  We should completely separate #11549 from this discussion.
   
   It has nothing to do with the instrumentation/OT integration that we are working on with @Melodie97 .
   The #11549 is completely different issue (and quite misleading in the context of opentelemetry integration). 
   
   What is described in #11549 is a dedicated, Prometheus operator that will an existing push gatway setup outside of Airlfow and allow "DAG writers" to push any metrics they want.
   
   What we are describing here is to provide an open-telemetry integration with Airlfow as a platform. The OT integration we want to add here will (eventually) provide a way for the "admin" of Airflow to set it up in the way that will provide various useful metrics of Airflow platform as a whole, statistics of dag execution etc.  All this independenly on the metrics collector and visualisation. Just gathering the metrics with OpenTelemetry and allowing the admin user to configure any exporter they want.
   
   So both the scope and the audience of those two are different.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Melodie97 commented on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
Melodie97 commented on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986763145


   Currently looking at how to go about this, will keep everyone updated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986830720


   Also @Melodie97 - see thee #20053 - it turned out that the idea of @subkanthi was about adding a provider to expose PushGateway API via provider, so it is indeed different from our proposal.
   
   In shourt what @subkanthi  proposed is to have a separate Prometheus Provider with a Hook/Operator that will be available to the users of airflow to be used as  something like that in the Dag (conceptually):
   
   ```
   @task
   def run_biguery_job():
        hook_bq = BigQueryHook()
        bq_result = hook_bg.run_the_job()
        hook_prometheus = PrometheusHook()
        bq_job_metrics = get_metrics_for_bq_job(bq_result)
        hook_prometheus.push_metrics_to_gateway(bq_job_metrics)
   ```
   
   Is that the right "assesment" @subkanthi ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] subkanthi edited a comment on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
subkanthi edited a comment on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986071051


   I was thinking of also supporting prometheus, we could push to prometheus pushgateway, even though its not recommended. But it will be minimum changes.
   
   https://prometheus.io/docs/practices/pushing/
   
   This might be quite useful in the short term as you dont need an additional statsd -> prometheus exporter process running and we can connect to an existing prometheus instance(also managed AWS and Google)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986090147


   Whatever we come up now - we will have to support with backwards compatibility promise in the future. I am pretty sure we do not want a short-term prometeus support now if we have a long term plan to support it differently. 
   
   Currently there are statsd exporters people can use (and do use) to export statsd to prometheus. In the future we will have (if the POC will work and AIP accepted) we will have opentelemetry-exporter that people would use (And we will support open-telemetnry and statsd for backwards compatibikity).
   
   What purpose woudl it serve to add yet another option to Airflow - one that again would have to be deprecated and dropped (especially that there is a working method to get the metrics in Prometheus via exporter) ? What problem would it solve?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986090147


   Whatever we come up now - we will have to support with backwards compatibility promise in the future. I am pretty sure we do not want a short-term prometeus support now if we have a long term plan to support it differently. 
   
   Currently there are statsd exporters people can use (and do use) to export statsd to prometheus. In the future we will have (if the POC will work and AIP accepted) we will have opentelemetry-exporter that people would use (And we will support open-telemetnry and statsd for backwards compatibility).
   
   What purpose woudl it serve to add yet another option to Airflow - one that again would have to be deprecated and dropped (especially that there is a working method to get the metrics in Prometheus via exporter) ? What problem would it solve?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986090147


   Whatever we come up now - we will have to support with backwards compatibility promise in the future. I am pretty sure we do not want a short-term prometeus support now if we have a long term plan to support it differently. 
   
   Currently there are statsd exporters people can use (and do use) to export statsd to prometheus. In the future we will have (if the POC will work and AIP accepted) we will have opentelemetry-exporter that people would use (And we will support open-telemetnry and statsd for backwards compatibility).
   
   What purpose would it serve to add yet another option to Airflow - one that again would have to be deprecated and dropped (especially that there is a working method to get the metrics in Prometheus via exporter) ? 
   
   What problem would it solve?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-1008335582


   @cherusk  We should completely separate #11549 from this discussion.
   
   It has nothing to do with the instrumentation/OT integration that we are working on with @Melodie97 .
   The #11549 is completely different issue (and quite misleading in the context of opentelemetry integration). 
   
   What is described in #11549 is a dedicated, Prometheus operator that will an existing push gatway setup outside of Airlfow and allow "DAG writers" to push any metrics they want.
   
   What we are describing here is to provide an open-telemetry integration with Airlfow as a platform. OT integration will provide a way for the "admin" of Airflow to set it up in the way that will provide various useful metrics of Airflow plattform as a whole, statistics of dag execution etc. 
   
   So both the scope and the audience of those two are different.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-1008335582


   @cherusk  We should completely separate #11549 from this discussion.
   
   It has nothing to do with the instrumentation/OT integration that we are working on with @Melodie97 .
   It is completely different issue (and quite misleading in the context of opentelemetry integration). 
   
   What is described in #11549 is a dedicated, Prometheus operator that will an existing push gatway setup outside of Airlfow and allow "DAG writers" to push any metrics they want.
   
   What we are describing here is to provide an open-telemetry integration with Airlfow as a platform. OT integration will provide a way for the "admin" of Airflow to set it up in the way that will provide various useful metrics of Airflow plattform as a whole, statistics of dag execution etc. 
   
   So both the scope and the audience of those two are different.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] subkanthi edited a comment on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
subkanthi edited a comment on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986071051


   I was thinking of also supporting prometheus, we could push to prometheus pushgateway, even though its not recommended. But it will be minimum changes.
   High level looking at the stats class, we could probably do it like the datadog statsd implementation.
   
   https://prometheus.io/docs/practices/pushing/
   
   This might be quite useful in the short term as you dont need an additional statsd -> prometheus exporter process running and we can connect to an existing prometheus instance(also managed AWS and Google)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19972: POC on configuring and running Airflow with basic OT integration (selected metrics)

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19972:
URL: https://github.com/apache/airflow/issues/19972#issuecomment-986090147


   Whatever we come up now - we will have to support with backwards compatibility promise in the future. I am pretty sure we do not want a short-term prometeus support now if we have a long term plan to support it differently. 
   
   Currently there are statsd exporters people can use (and do use) to export statsd to prometheus. In the future we will have (if the POC will work and AIP accepted) we will have opentelemetry-exporter that people would use (And we will support open-telemetnry and statsd for backwards compatibikity).
   
   What purpose woudl it serve to add yet another option to Airflow - one that again would have to be deprecated and dropped (especially that there is a working method to get the metrics in Prometheus via exporter) ? What problem woudl it solve?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org