You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "michaelmicheal (via GitHub)" <gi...@apache.org> on 2023/02/09 14:41:28 UTC

[GitHub] [airflow] michaelmicheal commented on pull request #29433: Add dataset update endpoint

michaelmicheal commented on PR #29433:
URL: https://github.com/apache/airflow/pull/29433#issuecomment-1424297531

   > I'm not sure of this, like the broadcasted event has no source dag/task etc. cc: @dstandish
   
   There's a few reasons why I think it's super important to at least support (not necessarily encourage) external dataset changes. 
   
   1. Integrate with external services and non-Airflow components of a pipeline. If a data science team has an external component of an ETL pipeline (for example data warehouse ingestion), these external services should be able to trigger workflows that depend on datasets when updated externally.
   2. Support multi-instance Airlfow architectures. With astro, cloud composer, and custom solutions (like us at Shopify), using multiple Airflow instance in production is very common. When one layer of the data platform is orchestrated in one instance, and another layer is orchestrated in a different instance, we rely on being able to broadcast dataset changes between Airflow instances. We need this integration to be able to pass dataset changes between Airflow instances through the API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org