You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Tzu-ping Chung <tp...@astronomer.io.INVALID> on 2024/01/02 10:23:02 UTC

[Discussion] AIP-60 Standard URI representation for Airflow Datasets

Happy 2024 everyone!

I’m going to kick off the new year by formally proposing a new AIP. This attempts to standardise the URI format used by Dataset events. This is driven a lot by the lack of adoption of Datasets. It turns out (maybe not surprisingly when I think about it) simply triggering events from a literal string name isn’t particularly useful (usable) in practical contexts, and some “smarter” features are generally sought after in most cases.

Two most popular examples are listening on a directory for file additions, or making operators emit Dataset events automatically like OpenLineage events. Both are technically doable, but rather impractical without abusing the literal string Dataset identifier. By introducing a standard semantic, those can be more easily implemented on the scheduler (listening) side instead.

Please find the document on Confluence:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-60+Standard+URI+representation+for+Airflow+Datasets

Both comments on the specification and/or implementation, or proposals to add to the URI formats are welcomed. Note that we don’t need to add all the formats in the AIP; this only attempts to establish a process to do so, so we can add new ones to the documentation in the future.

TP
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
For additional commands, e-mail: dev-help@airflow.apache.org


Re: [Discussion] AIP-60 Standard URI representation for Airflow Datasets

Posted by Bolke de Bruin <bd...@gmail.com>.
Nice work. I added some comments on parameters in the path component and
representation of cloud storage uris.

Bolke

On Wed, 3 Jan 2024 at 22:51, Jarek Potiuk <ja...@potiuk.com> wrote:

> Happy 2024,
>
> Added some nits - mostly informational, generally LGTM
>
> On Tue, Jan 2, 2024 at 11:23 AM Tzu-ping Chung <tp...@astronomer.io.invalid>
> wrote:
> >
> > Happy 2024 everyone!
> >
> > I’m going to kick off the new year by formally proposing a new AIP. This
> attempts to standardise the URI format used by Dataset events. This is
> driven a lot by the lack of adoption of Datasets. It turns out (maybe not
> surprisingly when I think about it) simply triggering events from a literal
> string name isn’t particularly useful (usable) in practical contexts, and
> some “smarter” features are generally sought after in most cases.
> >
> > Two most popular examples are listening on a directory for file
> additions, or making operators emit Dataset events automatically like
> OpenLineage events. Both are technically doable, but rather impractical
> without abusing the literal string Dataset identifier. By introducing a
> standard semantic, those can be more easily implemented on the scheduler
> (listening) side instead.
> >
> > Please find the document on Confluence:
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-60+Standard+URI+representation+for+Airflow+Datasets
> >
> > Both comments on the specification and/or implementation, or proposals
> to add to the URI formats are welcomed. Note that we don’t need to add all
> the formats in the AIP; this only attempts to establish a process to do so,
> so we can add new ones to the documentation in the future.
> >
> > TP
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > For additional commands, e-mail: dev-help@airflow.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> For additional commands, e-mail: dev-help@airflow.apache.org
>
>

-- 

--
Bolke de Bruin
bdbruin@gmail.com

Re: [Discussion] AIP-60 Standard URI representation for Airflow Datasets

Posted by Jarek Potiuk <ja...@potiuk.com>.
Happy 2024,

Added some nits - mostly informational, generally LGTM

On Tue, Jan 2, 2024 at 11:23 AM Tzu-ping Chung <tp...@astronomer.io.invalid> wrote:
>
> Happy 2024 everyone!
>
> I’m going to kick off the new year by formally proposing a new AIP. This attempts to standardise the URI format used by Dataset events. This is driven a lot by the lack of adoption of Datasets. It turns out (maybe not surprisingly when I think about it) simply triggering events from a literal string name isn’t particularly useful (usable) in practical contexts, and some “smarter” features are generally sought after in most cases.
>
> Two most popular examples are listening on a directory for file additions, or making operators emit Dataset events automatically like OpenLineage events. Both are technically doable, but rather impractical without abusing the literal string Dataset identifier. By introducing a standard semantic, those can be more easily implemented on the scheduler (listening) side instead.
>
> Please find the document on Confluence:
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-60+Standard+URI+representation+for+Airflow+Datasets
>
> Both comments on the specification and/or implementation, or proposals to add to the URI formats are welcomed. Note that we don’t need to add all the formats in the AIP; this only attempts to establish a process to do so, so we can add new ones to the documentation in the future.
>
> TP
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> For additional commands, e-mail: dev-help@airflow.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
For additional commands, e-mail: dev-help@airflow.apache.org