You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/24 22:53:24 UTC

[GitHub] [airflow] r-richmond edited a comment on issue #14396: Make context less nebulous

r-richmond edited a comment on issue #14396:
URL: https://github.com/apache/airflow/issues/14396#issuecomment-785437346


   > Let me explain why this would be a breaking change.... Surely you can map from Dataclass to Dict in "get_current_context()" method (otherwise you'd have to also correct the context["ti"] into context.ti. Thus it would be a breaking change.
   
   <details><summary>Changing `get_current_context` to have the following signature would prevent any breaking changes</summary>
   
   ```python
   def get_current_context(return_dataclass: bool=False) -> Union[Dict[str, Any], ~Dataclass.Context]:
     context: Dataclass = #
     if return_dataclass:
       return context
     else:
       logging.warning(msg="dictionary context has been deprecated in 2.1 please use return_dataclass = True")
       return as_dict(context)
   ```
   existing code continues just fine
   ```python
   @task
   def my_task():
       context = get_current_context()
       ti = context["ti"]
   ```
   this would then allow the users to do the following to get type hinting and at some point we could update the option's default value before finally removing it.
   ```python
   @task
   def my_task(return_dataclass=True):
       context = get_current_context()
       ti = context.ti
   ```
   </details>
   
   >But if you do that (i.e map dataclass to dict), then you loose the type hinting. This is precisely the place where typehinting is needed (in the user code). And I think Typed Dict (if used everywhere including return value from get_current_context() gives you what you need from the user point of view and there is no need to implement the > Dataclass step - as you have everything you need by implementing TypeDict.
   
   You're right that the returned dictionary appears to lose the type information. However:
   1. No type information is what we have today in user code so no change (although we'd have a nice clear definition of context findable from the function signature).
   1. Users would have an easy way to switch to strict typing mode with a simple migration path one function call at a time via the parameter.
   <details><summary>3. We would also give them the ability to use the nicer dataclass api & better IDE support</summary>
   Find references with dataclass (shows writes and reads and highlights field names)
   <img width="537" alt="image" src="https://user-images.githubusercontent.com/9246654/109076238-1ba3b500-76af-11eb-8223-5ba6ea09f69f.png">
   vs Find references with TypedDict (Misses read and doesn't highlight field names)
   <img width="590" alt="image" src="https://user-images.githubusercontent.com/9246654/109076512-8c4ad180-76af-11eb-8a7c-79609c098abc.png">
   
   
   </details>
   4. Users would also have a clear definition of context (This comes with either TypedDict or Dataclass) just wanted to call out this improvement.
   
   The advantage to dataclasses is that long term we will be providing users with a better api and developer experience. Making this change now is the easiest it will ever be as it only gets harder later in the project's life cycle. 
   
   Even in the [Typed Dict Pep 589](https://www.python.org/dev/peps/pep-0589/) Dataclasses are called out as a newer alternative for this use case. `Dataclasses are a more recent alternative to solve this use case`
   
   >In a number of cases yes. But in others flexibility of the dict and fact that you can add any value there, trumps.
   
   >While we do not encourage this behavior in any way, it is tempting to use the context this way and as a user you could do that and you could rely on context being dictionary able to hold any key you want. Dataclass breaks this assumption.
   
   You can do this with dataclasses as well; although as you mentioned it isn't encouraged. The better way would be to subclass the Dataclass but that is up to end users.
   
   <details><summary> Demo of adding a field dynamically</summary>
   
   ```python
   @dataclass
   class Demo:
       id: str
       value: int
   
   
   d = Demo("a", 1)
   
   # can add new fields
   d.new_field = "hello world"
   print(d.new_field)
   
   # prints hello world
   ```
   
   One nice thing is that mypy complains about it however the python code still runs
   `main.py:21: error: "Demo" has no attribute "new_field"`
   
   </details>
   
   I'm unaware of anything a dictionary can do that a dataclass can't. (I believe dataclasses internally are just dictionaries like a lot of python internals)
   
   >Or maybe I am missing something?
   
   To summarize here either dataclass or typedDict will have working code but leaning on [Raymond](https://twitter.com/raymondh/status/1175124784339808256?s=20) here in my assessment it feels like `context` is a class/instance problem not a dict problem hence my personal preference to use a dataclass. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org