You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/07 10:53:29 UTC

[GitHub] [airflow] potiuk commented on issue #14396: Make context less nebulous

potiuk commented on issue #14396:
URL: https://github.com/apache/airflow/issues/14396#issuecomment-962588693


   > Looking at maybe taking this on. I also had a good bit of "what the heck is even IN `context` and what's it for?" when I first started looking at the codebase, so I can see the benefit to new contributors and it'll be a good one to learn, myself.
   
   Cool!
   
   I think we have to make sure that we know what can be achieved here and what the "real" benefits of context stronger naming and typing is. Just adding typying without being able to make good use of it is kinda useless.  I.e. where and for whom actually "more typing and autocompletable naming" in context would help. 
   
   Let me summarize my view on what the requirements here are and who are the "users" of each requirement. What we are talking about here is to make easier to "write" dags and a bit easier to read them as `context.ti` is much more readable than `context.['ti']`.  We could also thing about some kind of verification and flagging typos. 
   
   * old way of accessing the context should work without any changes (context['ti'] should continue to work, also the "wrongly_typoed" context parameters should behave according to `template_undefined` in DAG as in usual Jinja context)  - this is hard requirement which we cannot drop.
   
   * auto-complete should include hints from the docstrings that explain the meaning of parameters (some of the dates require longer explanation) - this would be the single most important "improvement" we can get from this and without it, changing the context has very little improvement because for new users you'd have to anyhow look-it-up separately, for old users - they know the context field by heart already anyway, so auto-complete is just small improvement for them.
   
   * the documentation about the context https://github.com/apache/airflow/blob/main/docs/apache-airflow/templates-ref.rst should be generated automatically from the code. We do not want to maintain those separately  - the "code" should be single source of truth for the documentation. For me this is also a "hard" requirement that we cannot drop.
   
   * auto-complete should work in the source code when you are referring context directly in "classic operator's execute" - this would help those who write new operators.
   
   * The `typing` part in the context used this way works with MyPy type checks (I am working on bringing it back in the new version after temporary disabling it now).  This would prevent a number of problems for people writing dags and mistaking types.
   
   * Ideally (but that might be more difficutl) the context fields should be auto-completeable when you use `@task` decorator. In `@task` decorator you can pass context fields as parameter  of the task, it would be great if we can find a way to get context fields to be also auto-completable there. This would be super-helpful for those who write DAGs following the "task flow" way (which is somethig that I think we all agree is the "default" in the futuere - we are converting all examples to follow it and I believe it produces much cleaner, less-boilerplate code. This would be fantastic if we can get it working (and if MyPy checks would work here as well). Not sure though if this is achievable.
   
   * Also it would also be good if we could achieve auto-complete in jinja statements. That is a "nice-to-have" and - if possible - would likely be IDE-specific, but might be a very nice addition. Though it is really useful only for users who use the "classic" way of developing DAGs which we move away from, so I think this is not crucial and we can easily skip that one.
   
   > 1. Would the ideal solution be to make a new module under `airflow.models` to define the Context TypedDict and import it into `taskinstance.py`, or just define it at the top of `airflow.models.taskinstance`?
   
   I am not sure whether TypedDict is the right solution here, if we look at the above list. I'd rather say that a dedicated Context class which mixes-in the dict and field access would be a better solution in this case. When it comes to docstrings and explanation of the fields, and auto-complete in @task decorator, I am not sure TypedDict is a good solution. But that needs some exploring I think.
   
   > 2. If/once I do define the TypedDict, are any other changes needed in order to implement it?   I have a rough implementation locally already and it seems to pass CI without any further changes, but want to make sure I'm not underestimating the scope of this.
   
   As above - I am not sure how many users will benefit from "TypedDict" approach. I belive making a dedicated Context class with mix-in like approach woudl serve more cases from the list above.
   
   > 3. Context appears to contain some other Dicts (`conf` and `var`, specifically), should I chase that rabbit and define those as well while I am at it?
   
   * var for sure not - vars depend on what is in the DB or Secret Manager so it will be next to impossible to know the variables without those.
   * conf - that's more possible. We alreday have a well defined, automatically updated docs for conf variables, and this woudl be a nice addition to be able to auto-complete/doc all the conf variables there. It's not crucial, but since you are at it, it might be good to implement if easy.
    
   > 4. It looks like we use both `datetime.datetime` and `pendulum.DateTime` in various places, which would we prefer to use here in the Context definition for the various timestamp fields?  [[ [Discussing here](https://apache-airflow.slack.com/archives/CCPRP7943/p1635954964150000) ]]
   
   I think pendulum.DateTime (https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html?highlight=next_ds#variables). I think we should look at the code and see what kind of "datetime" we have there already. But that is one of the very good side-effects of this change  - maybe MyPy will be able to detect when we mix the datetime's  and warn us here?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org