You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Howard Yoo <ho...@gmail.com> on 2022/05/17 02:17:18 UTC

Re: [DISCUSSION] AIP-49 OpenTelemetry Support for Apache Airflow

Finally had some time to go over the changes and wrote summary of it in AIP
for better understanding, and updated the proposal doc:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-49+OpenTelemetry+Support+for+Apache+Airflow#AIP49OpenTelemetrySupportforApacheAirflow-CodeChangesSummary


Hope this helps!
Howard

On Fri, Apr 29, 2022 at 11:49 AM Howard Yoo <ho...@gmail.com> wrote:

> Got it. Will describe the changes that might be subjected to be done -
> however, please note that the changes were the things that I did on my own
> during the POC, so the final changes may obviously be different.
>
> On Fri, Apr 29, 2022 at 3:57 AM Elad Kalif <el...@apache.org> wrote:
>
>> I'd prefer that the Code Changes section would be more detailed. This is
>> a very important part.
>> I'm not sure referring to a private repo is the way to do it but even if
>> you should at least list the components which are subject to changes.
>> You don't need to specify every change you are going to do - only the
>> critical ones where it couples with airflow core.
>> You have changes for dag_proccessing, executors, scheduler_job and even
>> migration to add columns.
>>
>> https://github.com/howardyoo/airflow/commit/cc83c2b377ac22f0e7ef82e7f59784df972037fd
>> These are important
>>
>>
>> On Wed, Apr 27, 2022 at 10:09 PM Howard Yoo <ho...@gmail.com> wrote:
>>
>>> Hi Elad, I have updated the AIP-49 with the appropriate changes to
>>> contain what you requested in your previous comments. I think the AIP is
>>> fairly comprehensive and ready to be voted, unless there is any objections.
>>>
>>> Sincerely,
>>> Howard
>>>
>>> On Mon, Apr 25, 2022 at 11:03 PM Howard Yoo <ho...@gmail.com> wrote:
>>>
>>>> Hi Elad,
>>>> Will take a look at it and let you know!
>>>>
>>>> Thanks,
>>>> Howard
>>>>
>>>> On Mon, Apr 25, 2022 at 6:35 AM Elad Kalif <el...@apache.org> wrote:
>>>>
>>>>> Hi Howard,
>>>>> Have you made progress with addressing the comments?
>>>>> I think once points are addressed we can maybe start a vote?
>>>>>
>>>>> On Tue, Apr 12, 2022 at 12:39 PM Jarek Potiuk <ja...@potiuk.com>
>>>>> wrote:
>>>>>
>>>>>> > 1. I would assume the path for StatsD (dogstatsd) would be for
>>>>>> deprecation - we will perhaps comment or mark it as deprecation - and
>>>>>> should follow the established (or usual?) process of feature deprecation of
>>>>>> Airflow, once the opentelemetry is in place.
>>>>>>
>>>>>> Yep. Deprecation should be. And maybe accompanied with a "statsd
>>>>>> open-telemetry exporter" ? I think eventually we should have just OTEL
>>>>>> stats and nothing else. The current "configurable" metrics class
>>>>>> should be fully replaced with "OTEL configurable" metrics. Same as
>>>>>> (but that's a much longer and uncertain path) logging configuration
>>>>>> should get replaced if/when OTEL logging lives to its promises.
>>>>>>
>>>>>> > 2. Airflow will provide its own list of 'instrumented metrics' out
>>>>>> of the box - with the list specified in its documentation so that users
>>>>>> would be aware of them. However, with opentelemetry, the users will also
>>>>>> have the ability to add their 'custom' metrics / traces / logs to be
>>>>>> collected via opentelemetry as needed. That option and how-to's should also
>>>>>> be documented.
>>>>>>
>>>>>> Good point. Actually the nice thing is that adding new metrics could
>>>>>> then be done as part of task execution for example - so we should
>>>>>> indeed have some libraries/tools or maybe even operator's /task
>>>>>> interface should have some built-in capabilities and allow for
>>>>>> declarative ways of adding metrics (but this can be added as a
>>>>>> follow-up - it does not need to be described and hashed out yet IMHO -
>>>>>> but we can add it to the docs as "future improvement") .
>>>>>>
>>>>>> > 3. A little clarification - I believe there were two POC's - which
>>>>>> at the time during the first POC, it had lacked some of the features (e.g.
>>>>>> traces and logs). However, in the second POC, there is a stable release of
>>>>>> Traces, and beta release of logs, so things have been progressing.
>>>>>> Opentelemetry is highly evolving and many things do change and get added
>>>>>> relatively quickly. During my second POC (the one mentioned in the attached
>>>>>> PDF) I was able to validate that all of the 'key' features that we needed
>>>>>> to implement opentelemetry for airflow has been released and available.
>>>>>>
>>>>>> Yeah. I concur with that Howard wrote.  What was there a few months
>>>>>> ago a little "shaky", becomes more and more solid as time passes.  And
>>>>>> I also spoke with a few people who are involved in OpenTelemetry
>>>>>> standard and development (one of my friends is site-lead for Sumo
>>>>>> Logic Poland and they take active part in that effort and I just spoke
>>>>>> to him about it). Also AWS is very much vested in it as far as I
>>>>>> understand - I also spoke to Google and they are very much supporting
>>>>>> OTEL as industry standard. I think there is a firm industry backing
>>>>>> behind OTEL and there is no way it "won't progress" or "falter". This
>>>>>> is a little bit of a "leap of faith" that it will become fully
>>>>>> featured for our needs, but I think what is already there is "enough"
>>>>>> to justify the move and anything that comes out of Beta is a bonus.
>>>>>> Eventually if things will not go fast enough for us - we can also
>>>>>> actually contribute there on the "collection" level.  It will likely
>>>>>> have a much better outcome than if we try to integrate airflow with
>>>>>> multiple services ourselves - we will just have to make sure things
>>>>>> get collected "well" - and then all the different services will more
>>>>>> likely than not write exporters for it.
>>>>>>
>>>>>> > 4. This is a debatable topic - and I believe those may not be a
>>>>>> part of the core airflow code base, but would suggest perhaps we could
>>>>>> create a 'contribution' repo which may maintain those third party assets
>>>>>> (e.g. Grafana dashboard, Datadog dashboard, etc.) that users may use and
>>>>>> even participate in maintaining it.
>>>>>>
>>>>>> Yeah. I think  this is also an opportunity for someone who is
>>>>>> specializing in those or even let Grafana add it to their "portfolio".
>>>>>> I can imagine there might be some basic dashboards generated by
>>>>>> companies which do some kinds of integrations (and with an option of
>>>>>> "come to us when you want more customizability". While we won't be
>>>>>> able to endorse those, we can easily let it land in the "ecosystem"
>>>>>> page of ours. I think also we can easily reach out (for example to
>>>>>> Grafana to do it - in the case of Grafana, Myrle Krantz, ex Treasurer
>>>>>> from ASF who I know well is a Senior Manager in Grafana responsible
>>>>>> for Cloud team).
>>>>>>
>>>>>>
>>>>>> > 5. POC may not have all the changes, as the work was to 'prototype'
>>>>>> and answer questions like 'what if' when opentelemetry is in place. The
>>>>>> proposal actually has link to a GIT repo that contains the changes that
>>>>>> were done during the POC:
>>>>>> https://github.com/howardyoo/airflow/tree/opentelemetry-poc-1 .
>>>>>> Since the details of the changes would make the existing PDF even more
>>>>>> subjected to TL;DR, I linked this git branch for anyone interested in the
>>>>>> changes to take a look. I hope this would be sufficient.
>>>>>>
>>>>>> I think Elad is right that some  more details need to "surface" from
>>>>>> the POC to AIP. While the changes are very little - they don't change
>>>>>> any flows or logic in Airflow, it's more to "selectively" add
>>>>>> collections and make sure common "Span" id is shared throughout the
>>>>>> code (which I think is the biggest change).
>>>>>> I think Howard, it might make sense to extract parts of the POC and
>>>>>> put them as an outline of changes to implement in the "AIP".
>>>>>>
>>>>>> The POC is more trace of what you've done, but I think simply
>>>>>> translating this into "this is what we need to do" for those who will
>>>>>> just read AIP is important. Our AIP's are much more "Technical" in
>>>>>> nature than most of the
>>>>>>
>>>>>> >
>>>>>> > 6. Yes, I agree - I will update the AIP proposal to make the scope
>>>>>> more clearer. Thanks for the feedback!
>>>>>>
>>>>>> Yeah. Adding "future improvements" and especially "This is what we are
>>>>>> not going to do and leave for later" is important part of every AIP. I
>>>>>> think it's sometimes much more important to state what we are NOT
>>>>>> going to vs. what we are going to do.
>>>>>>
>>>>>> >
>>>>>> > Howard
>>>>>> >
>>>>>> > On Sun, Apr 3, 2022 at 1:27 PM Elad Kalif <el...@apache.org>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Thanks Howard!
>>>>>> >> Sorry for the delay, this was a long read. The PDF alone is 19
>>>>>> pages  :)
>>>>>> >> looks very good!
>>>>>> >>
>>>>>> >>  I have 6 questions/points to raise:
>>>>>> >>
>>>>>> >> 1. I'm not clear about what is to happen with StatsD .
>>>>>> >> It states "Make OpenTelemetry and StatsD optional and
>>>>>> interchangeable."
>>>>>> >> But do we want to support both in the long run?
>>>>>> >> It doesn't specify if we are deprecating statsD along with
>>>>>> completion of this AIP.
>>>>>> >>
>>>>>> >> 2. regarding adding metrics.
>>>>>> >> Do we intend to let users define their own KPIs/metrics to be
>>>>>> measured or it will be a closed list set by Airflow?
>>>>>> >>
>>>>>> >> 3. The POC specifies it uses a feature of open-telemetry (add
>>>>>> metrics) which was not yet released. Do we know the timeline for the
>>>>>> feature to be released?
>>>>>> >> Can we vote on a plan to use something which is not yet publicly
>>>>>> available?
>>>>>> >>
>>>>>> >> 4. Are the Grafana dashboards / other dashboards to be part of the
>>>>>> Airflow core code base?
>>>>>> >> I wonder if this should be in a dedicated repo?
>>>>>> >>
>>>>>> >> 5. Could you please clarify in the AIP page what changes are
>>>>>> required to the Airflow code base?
>>>>>> >> Some of them appear in the PDF but I'm not sure if that is all of
>>>>>> them?
>>>>>> >>
>>>>>> >> 6. The PDF has several open questions/ideas.
>>>>>> >> I think it would be best to add to the AIP a scope paragraph
>>>>>> listing what will be included in the first phase and what is left for other
>>>>>> phases.
>>>>>> >>
>>>>>> >> On Fri, Apr 1, 2022 at 4:26 PM Jarek Potiuk <ja...@potiuk.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> Or maybe that people are so stunned by the beauty and usefulness
>>>>>> of it
>>>>>> >>> that they cannot even say a word :)
>>>>>> >>>
>>>>>> >>> On Thu, Mar 31, 2022 at 11:37 PM Howard Yoo <ho...@gmail.com>
>>>>>> wrote:
>>>>>> >>> >
>>>>>> >>> > I think it may mean all is well, perhaps :-)
>>>>>> >>> > Howard
>>>>>> >>> >
>>>>>> >>> > On Thu, Mar 31, 2022 at 9:29 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>> wrote:
>>>>>> >>> >>
>>>>>> >>> >> Hello everyone,
>>>>>> >>> >>
>>>>>> >>> >> Would be great to get some comments and reviews  - especially
>>>>>> from
>>>>>> >>> >> those who are users and are doing monitoring. Howard made a
>>>>>> lot of
>>>>>> >>> >> effort to show examples of how OTEL might help in this.
>>>>>> >>> >>
>>>>>> >>> >> Otherwise, is the silence sign that all is good ?
>>>>>> >>> >>
>>>>>> >>> >> J.
>>>>>> >>> >>
>>>>>> >>> >> On Fri, Mar 25, 2022 at 9:53 PM Jarek Potiuk <ja...@potiuk.com>
>>>>>> wrote:
>>>>>> >>> >> >
>>>>>> >>> >> > And let me add to it - we laid some foundations for it with
>>>>>> Melodie
>>>>>> >>> >> > Ezeani - the Outreachy intern where we did some internal
>>>>>> integration
>>>>>> >>> >> > work that let us understand the challenges and state of
>>>>>> >>> >> > open-telemetry.
>>>>>> >>> >> >
>>>>>> >>> >> > I am super excited about what open-telemetry can bring to
>>>>>> Airflow -
>>>>>> >>> >> > both short term (in metrics and instrumentation) and longer
>>>>>> term - in
>>>>>> >>> >> > logging when logging is mature enough.
>>>>>> >>> >> >
>>>>>> >>> >> > The open-telemetry mov is at the heart of the principles we
>>>>>> have -
>>>>>> >>> >> > making Airflow "modern" but at the same time delegating
>>>>>> what's not
>>>>>> >>> >> > "core" to those who can do it better. Open Telemetry is
>>>>>> precisely
>>>>>> >>> >> > about that,
>>>>>> >>> >> >
>>>>>> >>> >> > Howard particularly brought a great experience from using
>>>>>> Open
>>>>>> >>> >> > Telemetry before and making some good judgements and POC on
>>>>>> the
>>>>>> >>> >> > metrics produced by Airflow and how they can be useful from
>>>>>> the
>>>>>> >>> >> > "maintainer value" side. I looked at it from the technical
>>>>>> integration
>>>>>> >>> >> > POV - and Melodie helped to validate some of the assumptions
>>>>>> and
>>>>>> >>> >> > expose some of the technical challenges.
>>>>>> >>> >> > The composite result is good, but we are looking with Howard
>>>>>> on some
>>>>>> >>> >> > insightful comments and critique - especially from the users
>>>>>> of
>>>>>> >>> >> > Airflow!
>>>>>> >>> >> >
>>>>>> >>> >> > I look forward to more cool stuff on Airflow!
>>>>>> >>> >> >
>>>>>> >>> >> > J.
>>>>>> >>> >> >
>>>>>> >>> >> > On Fri, Mar 25, 2022 at 9:39 PM Howard Yoo <
>>>>>> howardyoo@gmail.com> wrote:
>>>>>> >>> >> > >
>>>>>> >>> >> > > Hi all,
>>>>>> >>> >> > >
>>>>>> >>> >> > > I am pleased to announce the start of the discussion for
>>>>>> the new AIP draft that was recently been published:
>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-49+OpenTelemetry+Support+for+Apache+Airflow
>>>>>> >>> >> > >
>>>>>> >>> >> > > Jarek Potuik and I have been discussing about this
>>>>>> proposal since early this year. During that time, we worked together on
>>>>>> drafting this proposal, as well as doing another round of mini-POC to
>>>>>> refresh and test on feasibility of OpenTelemetry on Apache Airflow.
>>>>>> >>> >> > >
>>>>>> >>> >> > > As the POC was successful in terms of testing the
>>>>>> OpenTelemetry on Airflow, we would like to expand the discussion to a wider
>>>>>> user community here this mailing list to gather more consensus, comments,
>>>>>> and feedbacks regarding this AIP.
>>>>>> >>> >> > >
>>>>>> >>> >> > > Sincerely,
>>>>>> >>> >> > > Howard and Jarek
>>>>>>
>>>>>