You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "mobuchowski (via GitHub)" <gi...@apache.org> on 2023/02/01 15:36:50 UTC

[GitHub] [airflow] mobuchowski opened a new pull request, #29289: listener: simplify API by replacing SQLAlchemy event-listening by direct calls

mobuchowski opened a new pull request, #29289:
URL: https://github.com/apache/airflow/pull/29289

   After using Listener API after a couple of Airflow releases for OpenLineage plugin, we’ve found some of the aspects of the API lacking for that purpose.
   
   1. The API is canonically specified to call plugins when TaskInstanceState changes. This is very similar, but not 100% what we need - which is getting notified before and after task execution. That has some nice properties for Airflow - for example, SqlAlchemy event system can be utilized to notify listeners, instead of 10s of direct calls if needed - but not necessarily for us. What we need is an API that calls it “just before” and “just after” task execution, while providing it as close data as actual Operator execution code has. Since state changes - particularly for TaskInstanceState.RUNNING - aren’t that close (in code) to what we need, but is very close at execution, we need to do additional work, duplicating what Airflow does with
       1. Mapped tasks. On RUNNING, Mapped tasks aren’t yet “mapped”. This means we have to handle them specially, whereas we’d not need to do it if we were called later in the run process.
       2. Jinja templates aren’t yet rendered. This means we have to render them themselves, which is not a lightweight process and can call the Airflow database.   
   2. Second issue is connected to being directly called from the SqlAlchemy event system. We’re being called from inside a database connection. Currently, extractors perform a variety of tasks, which include calling external systems or Airflow DB in the meantime for various reasons. This means we can’t just execute code in the meantime - it fails the db transaction - we have to run it in some other thread, while unblocking the current one. This means we either have to block the thread - which right now is being done - or unblock it and risk data races.
   3. Another issue is that we’re not always getting actual TaskInstance objects in our plugin. What we’re getting is a detached SqlAlchemy that does not always have actual fields set. The issue https://github.com/apache/airflow/pull/21157 explains it in more depth. Basically, if not careful, Airflow changes can impact objects we receive in OL without noticing on the Airflow side.
   
   This change fixes all of the above, while preserving overall Pluggy API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ashb merged pull request #29289: listener: simplify API by replacing SQLAlchemy event-listening by direct calls

Posted by "ashb (via GitHub)" <gi...@apache.org>.
ashb merged PR #29289:
URL: https://github.com/apache/airflow/pull/29289


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org