You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/05/07 19:32:38 UTC
[GitHub] [airflow] bolkedebruin commented on issue #5254: [AIRFLOW-4473] Add
papermill operator
bolkedebruin commented on issue #5254: [AIRFLOW-4473] Add papermill operator
URL: https://github.com/apache/airflow/pull/5254#issuecomment-490224686
Papermill is awesome! Consider the following dag:
```
import airflow
from airflow.models import DAG
from airflow.operators.papermill_operator import PapermillOperator
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta
args = {
'owner': 'airflow',
'start_date': airflow.utils.dates.days_ago(2)
}
dag = DAG(
dag_id='example_papermill_operator', default_args=args,
schedule_interval='0 0 * * *',
dagrun_timeout=timedelta(minutes=60))
run_this = PapermillOperator(
task_id="run_example_notebook",
dag=dag,
input_nb="/tmp/hello_world.ipynb",
output_nb="/tmp/out-{{ execution_date }}.ipynb",
parameters={"msgs": "Ran from Airflow at {{ execution_date }}!"}
)
if __name__ == "__main__":
dag.cli()
```
the simple notebook looks like this
```
msgs = "Hello!" <-- parameterized cell
print(msgs)
```
BTW: you will also like this in the context of Amundsen. This operator auto generates lineage information. If you implement your own lineage client in Airflow you can integrate this with Neo4j/Elastic. Atlas is already supported ;-) (Overall the lineage capability in AIrflow needs to love, usage needs to guide it)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services