You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/05/07 19:32:38 UTC

[GitHub] [airflow] bolkedebruin commented on issue #5254: [AIRFLOW-4473] Add papermill operator

bolkedebruin commented on issue #5254: [AIRFLOW-4473] Add papermill operator
URL: https://github.com/apache/airflow/pull/5254#issuecomment-490224686
 
 
   Papermill is awesome! Consider the following dag:
   
   ```
   import airflow
   
   from airflow.models import DAG
   from airflow.operators.papermill_operator import PapermillOperator
   from airflow.operators.bash_operator import BashOperator
   
   from datetime import timedelta
   
   args = {
       'owner': 'airflow',
       'start_date': airflow.utils.dates.days_ago(2)
   }
   
   dag = DAG(
       dag_id='example_papermill_operator', default_args=args,
       schedule_interval='0 0 * * *',
       dagrun_timeout=timedelta(minutes=60))
   
   run_this = PapermillOperator(
       task_id="run_example_notebook",
       dag=dag,
       input_nb="/tmp/hello_world.ipynb",
       output_nb="/tmp/out-{{ execution_date }}.ipynb",
       parameters={"msgs": "Ran from Airflow at {{ execution_date }}!"}
   )
   
   if __name__ == "__main__":
       dag.cli()
   
   ```
   
   the simple notebook looks like this
   
   ```
   msgs = "Hello!" <-- parameterized cell
   
   print(msgs)
   ```
   
   BTW: you will also like this in the context of Amundsen. This operator auto generates lineage information. If you implement your own lineage client in Airflow you can integrate this with Neo4j/Elastic. Atlas is already supported ;-) (Overall the lineage capability in AIrflow needs to love, usage needs to guide it)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services