You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/24 01:33:46 UTC

[GitHub] [airflow] r-richmond commented on issue #14396: Make context less nebulous

r-richmond commented on issue #14396:
URL: https://github.com/apache/airflow/issues/14396#issuecomment-784646194


   >Dataclass is not pythonic at all
   
   As a counter point [Dacite](https://github.com/konradhalas/dacite) is a python library that shows how dataclasses can be used for something like this. Their config dataclass is a similar object to our context and is defined as follows.
   
   https://github.com/konradhalas/dacite/blob/d2206b2e4711859da0ea5862c395940f33693e80/dacite/config.py#L5-L12
   ```python
   @dataclass
   class Config:
       type_hooks: Dict[Type, Callable[[Any], Any]] = field(default_factory=dict)
       cast: List[Type] = field(default_factory=list)
       forward_references: Optional[Dict[str, Any]] = None
       check_types: bool = True
       strict: bool = False
       strict_unions_match: bool = False
   ```
   
   which is then consumed in their function `from_dict` 
   https://github.com/konradhalas/dacite/blob/d2206b2e4711859da0ea5862c395940f33693e80/dacite/core.py#L34
   ```python
   def from_dict(data_class: Type[T], data: Data, config: Optional[Config] = None) -> T:
       """Create a data class instance from a dictionary.
       :param data_class: a data class type
       :param data: a dictionary of a input data
       :param config: a configuration of the creation process
       :return: an instance of a data class
   ```
   
   As a user it feels so much better to be able to see the config in the function definition and then follow the type definition straight to the config objects strict definition (fields, types, optionally comments). 
   
   Alternatively when a dictionary is used its harder for an end user to reason about `what keys are present?`, `what are the values?`, `what are the types of the values?`, `where do I find this information?`. (Perhaps its my own ignorance but it isn't obvious to me today where I can look up this info for airflow's context dictionary and I've tried).
   
   As a developer tools like [mypy](https://github.com/python/mypy/blob/538d36481526135c44b90383663eaa177cfc32e3/docs/source/additional_features.rst), [pydantic](https://github.com/samuelcolvin/pydantic/) & various ides ([pycharm](https://blog.jetbrains.com/pycharm/2018/04/python-37-introducing-data-class/), vscode) also handle dataclasses quite well which helps prevent frustrating runtime errors. 
   
   
   >makes it difficult to evolve
   
   Not sure about this one.
   * To add new fields to data classes you simply add a new field to the dataclass definition with a default value.
   * To remove old fields you can use an ide's `find references` to remove all the old references and then delete the field from the dataclass.
       * Optionally, use a static analyzer like mypy before commit/merge to ensure you've gotten rid of all the old references
   
   Perhaps I'm still a rookie but for dictionaries it feels much harder to be sure you've gotten all the old key references out of the code, before you can safely remove it.
   
   >I think it's very pythonic to have dictionaries in Python and it brings a number of benefits, with little drawback
   
   I've called out a couple advantages I think dataclasses have over dictionaries. But I'm likely missing the dictionary advantages. Is Airflow using any of these advantages today?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org