You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/04 11:33:00 UTC

[GitHub] [airflow] potiuk commented on issue #28402: Support of reading dags from database by task_runners

potiuk commented on issue #28402:
URL: https://github.com/apache/airflow/issues/28402#issuecomment-1370815981

   > I thought serialized dag is just a python source code, isn't it?
   
   No. It's not. It's just json serialized DAG structure and metadata. 
   
   What you see in the UI is **just** the source code of the DAG in question - but you cannot see there any code it imports. And this is just for "inspection" - it's not possible to run this code, because of missing dependent code.
   
   In the current state, we cannot (and should not) serialize Python code to the database - simply because a python DAG can import arbitrary number of libraries, common code, other dags etc. And in case you have dynamic imports in the DAGs or local imports it is extremelly difficult (or actually impossible to determine which files should be put in such database).  Effectively what you ask for is to store the whole DAG folder as a record in a database for every single DAG run.
   
   With the current way how airflow works and how "flexible" Python is, that makes no sense - any kind of file sharing does the job much better than trying to read the whole DAG folder and convert it in a blob of all DAG files stored in a relational database. From performance point of view, it makes no sense. 
   
   Changing this would ba quite a fundamental change in how Airflow works, so it definitely does not pass the bar of a "Feature" - it definitely goes into the "Airflow Improvement Proposal" camp (so no @hussein-awala - I don't think we are going to assign it to anyone as this is definitely not someothing that would ever got accepted before we have a proper proposal and discussion about it).
   
   There are opened and never completed related Airflow Improvement Proposals (https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher, https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest) that were aiming at solving that problem.
   
   If you would like to change the behaviour, then the right approach is to either pick some of them, complete them (they are in Draft status), be able to explain and defend all the different cases and start a discussion about it in the Airflow Devlist (see https://airflow.apache.org/community/ for details on how to join it). You will need to specify it in the level of detail that will allow to asses all the cases, small/big deployments, performance considerations, describe different cases. 
   
   Converting it into discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org