You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Rob Harrison <ro...@gmail.com> on 2017/06/19 11:11:00 UTC

Passing Variables

Hi,

I would like to pass a variable to my airflow dag and would like to know if
there is a recommended method for doing this.

I am hoping to create a dag with python operators and tasks that read data
from a parquet table, perform a calculation then write the results into a
new table. I'd like to pass the source table name in along with the task
when calling the dag from the command line.

From what I have read, the following can be used to read a variable from
the command line:

airflow variables -s myvar="value"

Does anyone have an example of this they can share?

Thank you,
Rob

Re: Passing Variables

Posted by siddharth anand <sa...@apache.org>.
Ah.. I completely missed the question.. in my haste to do too many things.

Assuming you have a DAG named process_my_data with 3 tasks :
read__from_source_table --> transform --> write_to_new_table. This dag
should have a @none schedule.

You could write a script to read your list of source tables and call
airflow trigger_dag -c <a json string with param you want to pass to your
first task> -e <execution date>. This will launch a dag execution run for
each of the input that you call. I believe that the execution date should
differ by 1 second (timestamp granularity in the db).. so avoid a tight
loop with a 1 second sleep between executions.

You will see N dag runs, one for each of the N source tables that you pass
in.

-s

On Tue, Jun 20, 2017 at 12:22 PM, Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> One DAG cannot have multiple shapes at one time, by design. You cannot
> parameterize things that will affect the shape of your DAG (though note
> that you can fully parameterize what happens within individual task
> instances). Think about it, a DAG is one (and only one) graph. It's NOT a
> shapeshifting thing.
>
> As a workaround, and this may or may not be the right thing to do, you can
> write a DAG factory function, that will return a DAG object given
> parameters, but any given DAG instance (with a unique dag_id) has a single
> shape. If you do want to go that route, may want to use
> `schedule_interval='@once'`
>
> If you think the shape of your DAG needs to change from one DAG run to the
> next, you may want to re-think what is static and what is dynamic. Are your
> database tables schema changing from one DAG run to the next? No right?
> That'd be crazy! Most likely you want to think about the shape of your DAG
> in a similar way as you think about the schema of your tables: static or
> slowly changing.
>
> Max
>
> On Mon, Jun 19, 2017 at 4:11 AM, Rob Harrison <ro...@gmail.com> wrote:
>
> > Hi,
> >
> > I would like to pass a variable to my airflow dag and would like to know
> if
> > there is a recommended method for doing this.
> >
> > I am hoping to create a dag with python operators and tasks that read
> data
> > from a parquet table, perform a calculation then write the results into a
> > new table. I'd like to pass the source table name in along with the task
> > when calling the dag from the command line.
> >
> > From what I have read, the following can be used to read a variable from
> > the command line:
> >
> > airflow variables -s myvar="value"
> >
> > Does anyone have an example of this they can share?
> >
> > Thank you,
> > Rob
> >
>

Re: Passing Variables

Posted by Maxime Beauchemin <ma...@gmail.com>.
One DAG cannot have multiple shapes at one time, by design. You cannot
parameterize things that will affect the shape of your DAG (though note
that you can fully parameterize what happens within individual task
instances). Think about it, a DAG is one (and only one) graph. It's NOT a
shapeshifting thing.

As a workaround, and this may or may not be the right thing to do, you can
write a DAG factory function, that will return a DAG object given
parameters, but any given DAG instance (with a unique dag_id) has a single
shape. If you do want to go that route, may want to use
`schedule_interval='@once'`

If you think the shape of your DAG needs to change from one DAG run to the
next, you may want to re-think what is static and what is dynamic. Are your
database tables schema changing from one DAG run to the next? No right?
That'd be crazy! Most likely you want to think about the shape of your DAG
in a similar way as you think about the schema of your tables: static or
slowly changing.

Max

On Mon, Jun 19, 2017 at 4:11 AM, Rob Harrison <ro...@gmail.com> wrote:

> Hi,
>
> I would like to pass a variable to my airflow dag and would like to know if
> there is a recommended method for doing this.
>
> I am hoping to create a dag with python operators and tasks that read data
> from a parquet table, perform a calculation then write the results into a
> new table. I'd like to pass the source table name in along with the task
> when calling the dag from the command line.
>
> From what I have read, the following can be used to read a variable from
> the command line:
>
> airflow variables -s myvar="value"
>
> Does anyone have an example of this they can share?
>
> Thank you,
> Rob
>

Re: Passing Variables

Posted by siddharth anand <sa...@apache.org>.
We use Airflow variables heavily.

from airflow.models import Variable

# Load an environment variable as a string

ENV = Variable.get('ENV').strip()

# Load an environment variable as JSON and access a JSON field named
PLATFORM

PLATFORM = 'EP'

SSH_KEY = Variable.get('ep_platform_ssh_keys',
deserialize_json=True)[PLATFORM]


You can put this code in your dag file or in any python code your dag file
imports.

-s

On Mon, Jun 19, 2017 at 4:11 AM, Rob Harrison <ro...@gmail.com> wrote:

> Hi,
>
> I would like to pass a variable to my airflow dag and would like to know if
> there is a recommended method for doing this.
>
> I am hoping to create a dag with python operators and tasks that read data
> from a parquet table, perform a calculation then write the results into a
> new table. I'd like to pass the source table name in along with the task
> when calling the dag from the command line.
>
> From what I have read, the following can be used to read a variable from
> the command line:
>
> airflow variables -s myvar="value"
>
> Does anyone have an example of this they can share?
>
> Thank you,
> Rob
>