You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Chris Redekop <ch...@replicon.com> on 2022/01/15 19:09:36 UTC

How to (properly) cache data in a custom hook/connection?

I'm playing around with implementing a custom hook/connection to my
service, and there is some custom service discovery logic I need to perform
on a connection-by-connection basis. Is there any "proper" way to cache the
result (basically just a hostname) of my service discovery, so I don't have
to re-do the discovery on every task that uses my hook? I'd like to be able
to cache it globally if possible, but even on a dagrun-by-dagrun basis
would be ok. These are the options I've considered, in order of decreasing
attractiveness:
    1. Variables - simple, clean, but kinda gross that the cached values
will show up in the UI
    2. Read/Write values directly to the xcom table in the metadata db -
transparent, but obviously ugly and not exactly future-friendly
    3. Provide my own external globally accessible storage (S3 or redis or
something). Probably the most "correct"(?) but also the highest cost both
in terms of infrastructure and dev time/effort...and this direction would
not be practical if this custom hook/connection is ever released to the
public

Is there any other mechanism which would be more appropriate for this use?

Re: How to (properly) cache data in a custom hook/connection?

Posted by Daniel Standish <da...@astronomer.io>.
I think you have it about right.  Xcom seems like bad idea because it's
per-task-run and fiddly.  Variable seems like a good solution here.
Doesn't seem so gross to me.

If we ever implement state persistence (see AIP 30
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-30%3A+State+persistence>)
then you could use that.


On Sat, Jan 15, 2022 at 11:09 AM Chris Redekop <ch...@replicon.com> wrote:

> I'm playing around with implementing a custom hook/connection to my
> service, and there is some custom service discovery logic I need to perform
> on a connection-by-connection basis. Is there any "proper" way to cache the
> result (basically just a hostname) of my service discovery, so I don't have
> to re-do the discovery on every task that uses my hook? I'd like to be able
> to cache it globally if possible, but even on a dagrun-by-dagrun basis
> would be ok. These are the options I've considered, in order of decreasing
> attractiveness:
>     1. Variables - simple, clean, but kinda gross that the cached values
> will show up in the UI
>     2. Read/Write values directly to the xcom table in the metadata db -
> transparent, but obviously ugly and not exactly future-friendly
>     3. Provide my own external globally accessible storage (S3 or redis or
> something). Probably the most "correct"(?) but also the highest cost both
> in terms of infrastructure and dev time/effort...and this direction would
> not be practical if this custom hook/connection is ever released to the
> public
>
> Is there any other mechanism which would be more appropriate for this use?
>