You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/05 16:38:42 UTC

[GitHub] [airflow] subkanthi opened a new issue #20053: Add Prometheus Provider

subkanthi opened a new issue #20053:
URL: https://github.com/apache/airflow/issues/20053


   ### Description
   
   Prometheus supports a pushGateway option to push metrics to Prometheus.
   This might be useful in cases where metrics needs to be pushed in the ETL stage.
   
   For example, BigQuery has  metrics stored in information_schema tables which can be queried using Standard SQL. Its useful to run the Query and store the metrics in prometheus on a periodic basis.
   
   https://github.com/m-lab/prometheus-bigquery-exporter
   
   ### Use case/motivation
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986283065


   Ah ok. Then yeah. I was reading it wrong :). So - as I read it now - it's just about having an Operator that could be part of the DAG to push some metrics? If that's the case then I think this is a cool idea :).  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986274931


   Just to comment for the discussion related - we are looking into open telemetry support during our Outreachy internship https://github.com/apache/airflow/projects/14. This might take quite a while to explore the options, but is likely to turn into a long-term solution, providing that we produce an AIP and it gets approved. 
   
   I understand @subkanthi  that you want to work on it? 
   
   I thought about it and it might be benefitial for everyone to see how firect Prometeus integratiom might work (experimentally)  - to do in parallel. If somone is willing to do it, possibly yes if wills to spend the time and focus on that, we should treat it (same as Open Telemetry) as "Experimental" feature or POC possibly also to build a knowledge on the integration and see what challenges it brings and seeing if the OT approch might bring the same level of integration.
   
   I m not sure however if we want to merge it, if by the time this one is complete (including likely Helm chart integration etc.) we will see that we can do similar integration level with Open Telemetry.
   
   See the comments here: https://github.com/apache/airflow/issues/19972#issuecomment-986090147 - also @Melodie97  - you might be interested to take a look at the progress of this if there is a PR by @subkanthi so that it coudl be taken as inspiration (both ways most likely).
   
   @subkanthi - does it sound like a good approach?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986279158


   Hmm I read the issue as a proposal to build a "shared service" inside of Airflow that will be built in order to be usable to provide custom metrics by various providers. Am I wrong?
   
   If that the first case - "Provide  a common way for all providers to report their telemetry", this is pretty much the same as OpenTelemetry integration would provide - most likely (this is the scope of the POC on what's possible and how deeply we can integrate and what might be a way OpenTelemetry might be used by different parts of Airflow: reporting low-level metrics of Airlfow, reporting higher level metrics of Airflow components (such as SQLAlchemy), reporting "logical metrics" of Airflow but fainally - and possibly reporting custom metrics by other parts of Airflow (including Providers). 
   
   Once we integrate Open Telemetry into airflow, this might become de-facto standards for all components (including providers) to use open-telemetry, because it will become a common "telemetry" language that Airflow will use (at lest I see it as something that we will be able to validate during the OpenTelemetry POC so that we can have some good idea of what's possible and how much it involves). And in this case (the provider part) is where the overlap might be significant and I think this at least warrants a discussion at the devlist if the proposal goes beyond a POC. 
   
   I do not yet know what would it mean to add a "common" Prometheus provider that BigQuery would use. 
   
   Is it an actua "provider" that could be installed as one of the 70+ providers? Would google provider/ BiqQuery depend on it ? Which versions? What will be the dependencies? Or Is it a shared service which is part of the Airflow itself, or is it just set of library calls that do not even require configuration of Airlfow as a whole? How "common" this will be ? Which parts will be reusable ? Will they really be reusable? Why do we need it all all - cannot this be implemented directly in BigQuery?
   
   There are many questions, I think that warrant at the very least discussing the concept in the devlist. Not really  a call but explaining the concept in a mail to a devlist, where the concept will be discussed. Likely this shoudl turn into Airflow Improvement Proposal, if it's going to turn in to a kind of "shared service". This is why we are running the POC now to be better prepared to answer many of those questions, but idea is to come up with the AIP. 
   
   This is usually how it works in Airflow, when it comes to the changes that go beyond one provider or fix - and (at least how I am reading the proposal), this is something that you would like to provide as "common" interface for potentially many providers. Whic definitely warrants a devlist discussion.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] subkanthi commented on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
subkanthi commented on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986280955


   My intention for this specific issue is not related to opentelemetry or airflow metrics and definitely not a shared service.
   
   The idea was just to add prometheus just like any other provider, for example our influxdb provider. In this case I would just treat Prometheus as another database or provider.
   
   The example of bigquery was something that came to my mind, but Im sure there are many use cases where users want to write counters from DAG's to Prometheus, for example s3 bucket data size to prometheus on a daily ETL.
   
   Sorry for not being clear.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986274931


   Just to comment for the discussion related - we are looking into open telemetry support during our Outreachy internship https://github.com/apache/airflow/projects/14. This might take quite a while to explore the options, but is likely to turn into a long-term solution, provide that we produce an AIP and it gets approved. 
   
   I understand @subkanthi  that you want to work on it? 
   
   I thought about it and it might be benefitial for everyone to see how firect Prometeus integratiom might work (experimentally  - to do in parallel. If somone is willing to do it, possibly yes if wills to spend the time and focus on that, we should treat it (same as Open Telemetry) as "Experimental" feature or POC possibly also to build a knowledge on the integration and see what challenges it brings and seeing if the OT approch might bring the same level of integration.
   
   I m not sure however if we want to merge it, if by the time this one is complete (including likely Helm chart integration etc.) we will see that we can do similar integration level with Open Telemetry.
   
   See the comments here: https://github.com/apache/airflow/issues/19972#issuecomment-986090147 - also @Melodie97  - you might be interested to take a look at the progress of this if there is a PR by @subkanthi so that it coudl be taken as inspiration (both ways most likely).
   
   @subkanthi - does it sound like a good approach?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986288296


   I assume that it will also solve https://github.com/apache/airflow/issues/11549 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986274931


   Just to comment for the discussion related - we are looking into open telemetry support during our Outreachy internship[ (https://github.com/apache/airflow/projects/14). This might take quite a while to explore the options, but is likely to turn into a long-term solution, provide that we produce an AIP and it gets approved. 
   
   I understand @subkanthi  that you want to work on it? 
   
   I thought about it and it might be benefitial for everyone to see how firect Prometeus integratiom might work (experimentally  - to do in parallel. If somone is willing to do it, possibly yes if wills to spend the time and focus on that, we should treat it (same as Open Telemetry) as "Experimental" feature or POC possibly also to build a knowledge on the integration and see what challenges it brings and seeing if the OT approch might bring the same level of integration.
   
   I m not sure however if we want to merge it, if by the time this one is complete (including likely Helm chart integration etc.) we will see that we can do similar integration level with Open Telemetry.
   
   See the comments here: https://github.com/apache/airflow/issues/19972#issuecomment-986090147 - also @Melodie97  - you might be interested to take a look at the progress of this if there is a PR by @subkanthi so that it coudl be taken as inspiration (both ways most likely).
   
   @subkanthi - does it sound like a good approach?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] subkanthi edited a comment on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
subkanthi edited a comment on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986275673


   @potiuk This is a different use-case, I wish we can have a quick call. 
   This is useful not for airflow metrics, but for scheduling metrics transfer across application.
   
   As I had mentioned the bigquery example , there are cases where you need to run a bigquery SQL query to read the information_schema tables like size of datasets and periodically write them to prometheus so you can track them.
   Really we are trying to create a time series data of metrics through this operator. 
   
   This project kind of does that.
   
   https://github.com/m-lab/prometheus-bigquery-exporter
   
   
   Hopefully I made some sense.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986279636


   (and to add -  just calling me is definitely not needed/advised :). The dev community needs to be involved to see the proposal (if it's scope is what I understood) - that's why devlist is the right target.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] subkanthi commented on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
subkanthi commented on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986275673


   @potiuk This is a different use-case, I wish we can have a quick call. 
   This is useful not for airflow metrics, but for orchestrating and scheduling metrics across application.
   
   As I had mentioned the bigquery example , there are cases where you need to run a bigquery SQL query to read the information_schema tables like size of datasets and periodically write them to prometheus so you can track them.
   
   This project kind of does that.
   
   https://github.com/m-lab/prometheus-bigquery-exporter
   
   Hopefully I made some sense.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986279158


   Hmm I read the issue as a proposal to build a "shared service" inside of Airflow that will be built in order to be usable to provide custom metrics by various providers. Am I wrong?
   
   If that the case - "Provide  a common way for all providers to report their telemetry", this is pretty much the same as OpenTelemetry integration would provide - most likely. This is the scope of the POC on what's possible and how deeply we can integrate and what might be a way OpenTelemetry might be used by different parts of Airflow: reporting low-level metrics of Airlfow, reporting higher level metrics of Airflow components (such as SQLAlchemy), reporting "logical metrics" of Airflow but finally - and possibly reporting custom metrics by other parts of Airflow (including Providers). 
   
   Once we integrate Open Telemetry into airflow, this might become de-facto standards for all components (including providers) to use open-telemetry, because it will become a common "telemetry" language that Airflow will use (at lest I see it as something that we will be able to validate during the OpenTelemetry POC so that we can have some good idea of what's possible and how much it involves). And in this case (the provider part) is where the overlap might be significant and I think this at least warrants a discussion at the devlist if the proposal goes beyond a POC. 
   
   I do not yet know what would it mean to add a "common" Prometheus provider that BigQuery would use. 
   
   Is it an actua "provider" that could be installed as one of the 70+ providers? Would google provider/ BiqQuery depend on it ? Which versions? What will be the dependencies? Or Is it a shared service which is part of the Airflow itself, or is it just set of library calls that do not even require configuration of Airlfow as a whole? How "common" this will be ? Which parts will be reusable ? Will they really be reusable? Why do we need it all all - cannot this be implemented directly in BigQuery?
   
   There are many questions, I think that warrant at the very least discussing the concept in the devlist. Not really  a call but explaining the concept in a mail to a devlist, where the concept will be discussed. Likely this shoudl turn into Airflow Improvement Proposal, if it's going to turn in to a kind of "shared service". This is why we are running the POC now to be better prepared to answer many of those questions, but idea is to come up with the AIP. 
   
   This is usually how it works in Airflow, when it comes to the changes that go beyond one provider or fix - and (at least how I am reading the proposal), this is something that you would like to provide as "common" interface for potentially many providers. Whic definitely warrants a devlist discussion.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986279158


   Hmm I read the issue as a proposal to build a "shared service" inside of Airflow that will be built in order to be usable to provide custom metrics by various providers. Am I wrong?
   
   If that the case - "Provide  a common way for all providers to report their telemetry", this is pretty much the same as OpenTelemetry integration would provide - most likely. This is the scope of the POC on what's possible and how deeply we can integrate and what might be a way OpenTelemetry might be used by different parts of Airflow: reporting low-level metrics of Airlfow, reporting higher level metrics of Airflow components (such as SQLAlchemy), reporting "logical metrics" of Airflow but fainally - and possibly reporting custom metrics by other parts of Airflow (including Providers). 
   
   Once we integrate Open Telemetry into airflow, this might become de-facto standards for all components (including providers) to use open-telemetry, because it will become a common "telemetry" language that Airflow will use (at lest I see it as something that we will be able to validate during the OpenTelemetry POC so that we can have some good idea of what's possible and how much it involves). And in this case (the provider part) is where the overlap might be significant and I think this at least warrants a discussion at the devlist if the proposal goes beyond a POC. 
   
   I do not yet know what would it mean to add a "common" Prometheus provider that BigQuery would use. 
   
   Is it an actua "provider" that could be installed as one of the 70+ providers? Would google provider/ BiqQuery depend on it ? Which versions? What will be the dependencies? Or Is it a shared service which is part of the Airflow itself, or is it just set of library calls that do not even require configuration of Airlfow as a whole? How "common" this will be ? Which parts will be reusable ? Will they really be reusable? Why do we need it all all - cannot this be implemented directly in BigQuery?
   
   There are many questions, I think that warrant at the very least discussing the concept in the devlist. Not really  a call but explaining the concept in a mail to a devlist, where the concept will be discussed. Likely this shoudl turn into Airflow Improvement Proposal, if it's going to turn in to a kind of "shared service". This is why we are running the POC now to be better prepared to answer many of those questions, but idea is to come up with the AIP. 
   
   This is usually how it works in Airflow, when it comes to the changes that go beyond one provider or fix - and (at least how I am reading the proposal), this is something that you would like to provide as "common" interface for potentially many providers. Whic definitely warrants a devlist discussion.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986279158


   Hmm I read the issue as a proposal to build a "shared service" inside of Airflow that will be built in order to be usable to provide custom metrics by various providers. Am I wrong?
   
   If that the case - "Provide  a common way for all providers to report their telemetry", this is pretty much the same as OpenTelemetry integration would provide - most likely (this is the scope of the POC on what's possible and how deeply we can integrate and what might be a way OpenTelemetry might be used by different parts of Airflow: reporting low-level metrics of Airlfow, reporting higher level metrics of Airflow components (such as SQLAlchemy), reporting "logical metrics" of Airflow but fainally - and possibly reporting custom metrics by other parts of Airflow (including Providers). 
   
   Once we integrate Open Telemetry into airflow, this might become de-facto standards for all components (including providers) to use open-telemetry, because it will become a common "telemetry" language that Airflow will use (at lest I see it as something that we will be able to validate during the OpenTelemetry POC so that we can have some good idea of what's possible and how much it involves). And in this case (the provider part) is where the overlap might be significant and I think this at least warrants a discussion at the devlist if the proposal goes beyond a POC. 
   
   I do not yet know what would it mean to add a "common" Prometheus provider that BigQuery would use. 
   
   Is it an actua "provider" that could be installed as one of the 70+ providers? Would google provider/ BiqQuery depend on it ? Which versions? What will be the dependencies? Or Is it a shared service which is part of the Airflow itself, or is it just set of library calls that do not even require configuration of Airlfow as a whole? How "common" this will be ? Which parts will be reusable ? Will they really be reusable? Why do we need it all all - cannot this be implemented directly in BigQuery?
   
   There are many questions, I think that warrant at the very least discussing the concept in the devlist. Not really  a call but explaining the concept in a mail to a devlist, where the concept will be discussed. Likely this shoudl turn into Airflow Improvement Proposal, if it's going to turn in to a kind of "shared service". This is why we are running the POC now to be better prepared to answer many of those questions, but idea is to come up with the AIP. 
   
   This is usually how it works in Airflow, when it comes to the changes that go beyond one provider or fix - and (at least how I am reading the proposal), this is something that you would like to provide as "common" interface for potentially many providers. Whic definitely warrants a devlist discussion.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986274931


   Just to comment for the discussion related - we are looking into open telemetry support during our Outreachy internship https://github.com/apache/airflow/projects/14. This might take quite a while to explore the options, but is likely to turn into a long-term solution, providing that we produce an AIP and it gets approved. 
   
   I understand @subkanthi  that you want to work on it? 
   
   I thought about it and it might be benefitial for everyone to see how firect Prometeus integratiom might work (experimentally  - to do in parallel. If somone is willing to do it, possibly yes if wills to spend the time and focus on that, we should treat it (same as Open Telemetry) as "Experimental" feature or POC possibly also to build a knowledge on the integration and see what challenges it brings and seeing if the OT approch might bring the same level of integration.
   
   I m not sure however if we want to merge it, if by the time this one is complete (including likely Helm chart integration etc.) we will see that we can do similar integration level with Open Telemetry.
   
   See the comments here: https://github.com/apache/airflow/issues/19972#issuecomment-986090147 - also @Melodie97  - you might be interested to take a look at the progress of this if there is a PR by @subkanthi so that it coudl be taken as inspiration (both ways most likely).
   
   @subkanthi - does it sound like a good approach?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] subkanthi edited a comment on issue #20053: Add Prometheus Provider

Posted by GitBox <gi...@apache.org>.
subkanthi edited a comment on issue #20053:
URL: https://github.com/apache/airflow/issues/20053#issuecomment-986275673


   @potiuk This is a different use-case, I wish we can have a quick call. 
   This is useful not for airflow metrics, but for orchestrating and scheduling metrics across application.
   
   As I had mentioned the bigquery example , there are cases where you need to run a bigquery SQL query to read the information_schema tables like size of datasets and periodically write them to prometheus so you can track them.
   Really we are trying to create a time series data of metrics through this operator. 
   
   This project kind of does that.
   
   https://github.com/m-lab/prometheus-bigquery-exporter
   
   
   Hopefully I made some sense.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org