You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/08 13:30:38 UTC

[GitHub] [airflow] potiuk commented on issue #26861: Variables set via environment not accessible via HTTP API

potiuk commented on issue #26861:
URL: https://github.com/apache/airflow/issues/26861#issuecomment-1272320589

   I am inclining to close it as "not a bug" but would love to hear also what others say.
   
   While this change is indeed breaking old behaviour, one might say it is bringing back consistency to the behaviour of the API with other API's and fixing an incidental behaviour that the API shown before. 
   
   Nowhere it is mentioned, needed or even expected that the API is "single source of truth" - in this case there is not even single "truth" that we can talk about. It all depends on definition the `get_variable` function has.
   
   If we define "variable" APIs as exclusively showing and managing variables that are defined via DB then:
   
   a) it is consistent wihth list, update, create delete methods - it is impossible to modify in any way the variables defined via env variables or even secrets, so naturally, those methods do not work. Similarly it is just "standard" airflow behaviour that listing a variables defined by env variables or secrets is impossible. You need to know they are there. 
   
   b) if the variable is defined in both DB and env, then this is impossible to read DB variable which you just set with PATCH. That turns effectively the DB variable in "write-only"  - you can create, update, delete it. But you can't read it. That was very inconsistent behaviour that #25370
   
   c) the new behaviour is consistent with UI/Web behaviour - you cannot list/read env variables via web.
   
   d) if we leave the env variable reading, we open up for behaviour where it might be wrong anyway. The API endpoint runs on the webserver, so if you set variable via ENV variables only on workers and scheduler, but not on the webserver, the API will not return the right "ENV" variable value anyway. And TBH - the env variable for variables set on webserver are useless because none of the webserver code uses variables (except from manipulating the DB variables) - the only way you can actually use variables is in workers (and if you are a bit sloppy and do not follow our best practices - in DagFileProcessor - but this should be discourage due to performance reasons).
   
   That means that it cannot even be considered as "source of truth" because the "truth" will depend on whether the ENV is set consistently everywhere. It is not required either to have same ENV variable everywhere. You could even imagine that you have different types of workers and each worker has a different ENV variable set. For example you could use variable to get the right URL you should use for this particular worker to communicate - and the URL could be different for different workers. This is a very viable use of ENV-defined variables and nowhere in Airlfow we prevent or discourage doing it. It will "just work". In this case it's even IMPOSSIBLE to get the source of truth because the value of the variable from env will be different on each worker and `get_variable` would be completely meaningless in this context.
   
   Taking into account all of the above I have strong opinion that we should close it. I am going for holidays now, but if there are other committers who think similarly, and agree with me, then I think we should close that one.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org