You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by /dev /local/ca <de...@gmail.com> on 2021/05/21 01:55:18 UTC

Can an experienced developer, please take some time to create some useful/helpful documentation in getting started, with a helpful real world example, where a user starts out with an existing repo, with code that needs to be called/scheduled?

Can an experienced developer, please create some useful/helpful
documentation in getting started, with a helpful real world example, where
a user starts out with an existing repo, with code that needs to be
called/scheduled?

--
First, I am hoping to get an answer to this question so I can get up and
going, and then second to subsequently also get the *airflow.apache.org
<http://airflow.apache.org>* web site updated with '*getting started*'
documentation that is helpful.

---
I have a git repo on my local machine here with some python code
*c:\repos\myrepo\src\test.py*   <== the python script I want Airflow to
run/execute on a schedule

It is hosted on github.

I have airflow installed and running ("local install") on an EC2 instance.
I can access the web page on my local dev machine:  *http://<ip>:<port>*
and login to the airflow console.

I git cloned the code on the EC2 instance

I now want airflow to invoke a python script (test.py) on a recurring basis
(once a day for example, at a specific time)

How do I do this?  I am led to a dead end with the current instructions.

---
*Details:*

I went to airflow.com and on the Install page:
https://airflow.apache.org/docs/apache-airflow/stable/start/index.html

There is a link:  [*Quick Start*]

*I clicked there:*
https://airflow.apache.org/docs/apache-airflow/stable/start/index.html

I clicked: running airflow locally (installed on EC2 instance, which is not
in Docker)

https://airflow.apache.org/docs/apache-airflow/stable/start/local.html

---
I was able to get to the web page/url

I enabled 'example_bash_operator' & example_python_operator, and clicked
inside to look at the '<> Code'

---
*===> Get this:*

At this point, I am no closer to understanding what I need to do, to have
Airflow execute code in a repo I have on a schedule (test.py).

step by step, what do I need to do to create a new job that will execute my
code?

I do not see these sample DAG's calling external code (code in another
repo).  All the Python code that is to be executed is contained in the
example.

There are huge gaping holes in the instructions here to help someone get up
and going quickly.

--
On the Airflow home page:
http://<ip-address>:<port>/home

**There is no [+] Add DAG** (no plus button) to add a DAG.  Is this the
idea?

*Also, I need help with the following:*
That would be helpful to get started, but ultimately, I need to deploy jobs
programatically to the server.

Any and all help to help me get across this canyon would definitely help.
I do not know if I am supposed to add Airflow DAG code to my existing repo
(wrapping my test.py code with the example DAG code, just lost here

or whether I should create an 'airflow/' repo, put code there, package my
code as a library, import etc, and call from there.

*I'm just lost here, I think most would be following these instructions.*

--
Can someone *PLEASE*, that works on the airflow project, please take some
time, writing a very minimal step by step guide on executing a python
script .py that exists in a users repo?

*Question:*
On my local machine, where does the DAG code live? (in my repo where my
python code lives - that is to be executed), or is it a better practice to
create a 'airflow' repo that then points to the python code repo?

*Question:*
In the DAG code, I only want a one liner running my code (referring to it,
that is in another repo - how is this accomplished?

*Question:*
Can I set an environment variable in the DAG? (that my script will read)