You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/05 11:24:24 UTC

[GitHub] [airflow] Melodie97 opened a new pull request #20677: Opentelemetry instrumentation

Melodie97 opened a new pull request #20677:
URL: https://github.com/apache/airflow/pull/20677


   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20677: Opentelemetry instrumentation

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20677:
URL: https://github.com/apache/airflow/pull/20677#issuecomment-1008327884


   Hey Melodie - I pushed a fixup - I managed to find and workaround the problems:
   
   It was not easy and required to know quite a bit of Airflow internals:
   
   * we needed to run gunicorn inside Airflow with `opentelemetry-instrument" not only airflow
   * open-telemetry does not seem to run properly with gunicorn by default, so even if application is properly instrumented (which happens automatically by adding opentelemetry-instrunment to gunicorn), the fork model of Gunicorn is such that it does not really work well with open-telemetry without the workaround documented here:
   
   https://opentelemetry-python.readthedocs.io/en/latest/examples/fork-process-model/README.html
   
   My chnge adds this workaround and produces this :tada: :tada: 
   
   ![Screenshot from 2022-01-09 17-09-32](https://user-images.githubusercontent.com/595491/148690714-415d2441-89bb-4b11-9639-3bd260f6a0bf.png)
   
   Try it and we can talk about next steps :)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] hterik commented on a change in pull request #20677: Opentelemetry instrumentation Flask

Posted by GitBox <gi...@apache.org>.
hterik commented on a change in pull request #20677:
URL: https://github.com/apache/airflow/pull/20677#discussion_r780956860



##########
File path: airflow/tracer.py
##########
@@ -0,0 +1,23 @@
+from opentelemetry import trace
+from opentelemetry.instrumentation.flask import FlaskInstrumentor
+from opentelemetry.instrumentation.requests import RequestsInstrumentor
+from opentelemetry.exporter.jaeger.thrift import JaegerExporter
+from opentelemetry.sdk.resources import SERVICE_NAME, Resource
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+
+def init(service_name, metric_port):
+    jaeger_exporter = JaegerExporter(

Review comment:
       I think the idea of opentelemetry is to provide a vendor-agnostic _agent_ on the localhost, which can forward the data to an external server, which could be either Jaeger or any other collector. 
   So you don't need to run the whole Jaeger server (all-in-one) inside airflows own docker-compose, only the agent is required. 
   Jaeger ships with such an agent and there is also one provided by opentelemetry, which can forward to Jaeger. Attaching these to console output could be difficult to not get mixed up with logging i assume?
   
   As long as there is a way to configure for forwarding the telemetry to a server running outside of the airflow docker-compose it's good :)
   
   (Sorry for drive-by-comments without knowing the full context and goals here, feel free to ignore)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shicholas commented on a change in pull request #20677: Opentelemetry instrumentation Flask

Posted by GitBox <gi...@apache.org>.
shicholas commented on a change in pull request #20677:
URL: https://github.com/apache/airflow/pull/20677#discussion_r780850977



##########
File path: airflow/tracer.py
##########
@@ -0,0 +1,23 @@
+from opentelemetry import trace
+from opentelemetry.instrumentation.flask import FlaskInstrumentor
+from opentelemetry.instrumentation.requests import RequestsInstrumentor
+from opentelemetry.exporter.jaeger.thrift import JaegerExporter
+from opentelemetry.sdk.resources import SERVICE_NAME, Resource
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+
+def init(service_name, metric_port):
+    jaeger_exporter = JaegerExporter(

Review comment:
       I'd consider using the `ConsoleSpanExporter` instead of Jaeger. Jaeger is great but requires extra steps like the server in the docker-compose file. The ConsoleSpan on the other hand just prints Open Telemetry spans to STDOUT.
   
   Will email about this too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Melodie97 closed pull request #20677: Opentelemetry instrumentation Flask

Posted by GitBox <gi...@apache.org>.
Melodie97 closed pull request #20677:
URL: https://github.com/apache/airflow/pull/20677


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Melodie97 commented on pull request #20677: Opentelemetry instrumentation Flask

Posted by GitBox <gi...@apache.org>.
Melodie97 commented on pull request #20677:
URL: https://github.com/apache/airflow/pull/20677#issuecomment-1008338009


   Hurray! Thanks a lot for the help Jarek! I’ll look into it and work more
   integrations
   
   On Sun, 9 Jan 2022 at 5:22 PM, Jarek Potiuk ***@***.***>
   wrote:
   
   > cc: @eladkal <https://github.com/eladkal> @xurror
   > <https://github.com/xurror> 🎉 ^^
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/airflow/pull/20677#issuecomment-1008329763>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ASWQCXHDMPEPUCWY27SQZGTUVGY3HANCNFSM5LJVKPIQ>
   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
   > or Android
   > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
   >
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20677: Opentelemetry instrumentation Flask

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20677:
URL: https://github.com/apache/airflow/pull/20677#discussion_r780961195



##########
File path: airflow/tracer.py
##########
@@ -0,0 +1,23 @@
+from opentelemetry import trace
+from opentelemetry.instrumentation.flask import FlaskInstrumentor
+from opentelemetry.instrumentation.requests import RequestsInstrumentor
+from opentelemetry.exporter.jaeger.thrift import JaegerExporter
+from opentelemetry.sdk.resources import SERVICE_NAME, Resource
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+
+def init(service_name, metric_port):
+    jaeger_exporter = JaegerExporter(

Review comment:
       See the mail - I explained some details there. We alreeady have 'integrations" in airflow (separate compose files) and we have a mechanism to enable/disable them in the dev environment - the docker-compose we are talking about is the "Breeze" compose (which is Airflow's development environment and it allows to start whole airflow with optional integrations (kerberos, mongo, different databases etc.) so Jaeger (as server) will eventually have it's own compose file (as all other integrations in Airflow Breeze). And yeah in airflow we will only have (and also as an optional extra) jaeger agent (in OT nomenclature - it is exporter) - and possibly few other exporters.
   
   We even have task for that, to modularise the access. It's not super straightforward though, as we'eve learned that with Gunicorn (and likely Airflow worker model where we use forking), we need to add some extra "exporter-specific" initialization after fork (https://opentelemetry-python.readthedocs.io/en/latest/examples/fork-process-model/README.html) but we will get there eventually.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20677: Opentelemetry instrumentation Flask

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20677:
URL: https://github.com/apache/airflow/pull/20677#discussion_r780962474



##########
File path: airflow/tracer.py
##########
@@ -0,0 +1,23 @@
+from opentelemetry import trace
+from opentelemetry.instrumentation.flask import FlaskInstrumentor
+from opentelemetry.instrumentation.requests import RequestsInstrumentor
+from opentelemetry.exporter.jaeger.thrift import JaegerExporter
+from opentelemetry.sdk.resources import SERVICE_NAME, Resource
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+
+def init(service_name, metric_port):
+    jaeger_exporter = JaegerExporter(

Review comment:
       And yeah - another tasks is to integrate Airflow with "services" who handle OT - (Datadog and others) - so totally Airlfow will be able to connect to external OT consumers and Jaeger is really there to make it easy to automatically start development environment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20677: Opentelemetry instrumentation

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20677:
URL: https://github.com/apache/airflow/pull/20677#issuecomment-1008329763


   cc: @eladkal @xurror :tada: ^^


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20677: Opentelemetry instrumentation Flask

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20677:
URL: https://github.com/apache/airflow/pull/20677#discussion_r780808425



##########
File path: airflow/www/gunicorn_config.py
##########
@@ -37,3 +42,18 @@ def on_starting(server):
 
     # Load providers before forking workers
     ProvidersManager().connection_form_widgets
+
+
+def post_fork(server, worker):
+    """
+    Needed to workoround open-telemetry not working well with gunicorn fork model.
+
+    See more: https://opentelemetry-python.readthedocs.io/en/latest/examples/fork-process-model/README.html
+    """
+    server.log.info("Worker spawned (pid: %s)", worker.pid)
+
+    resource = Resource.create(attributes={"service.name": "api-service"})

Review comment:
       BTW. I think it should be "airflow-service" here :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org