You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Bolke de Bruin <bd...@gmail.com> on 2017/01/20 09:07:05 UTC

Experiences with 1.8.0 (updated)

— continued accidentally pressed send —

This is to report back on some of the (early) experiences we have with Airflow 1.8.0 (beta 1 at the moment):

1. The UI does not show faulty DAG, leading to confusion for developers. 
When a faulty dag is placed in the dags folder the UI would report a parsing error. Now it doesn’t due to the separate parising (but not reporting back errors)

2. The hive hook sets ‘airflow.ctx.dag_id’ in hive
We run in a secure environment which requires this variable to be whitelisted if it is modified (needs to be added to UPDATING.md)

3. DagRuns do not exist for certain tasks, but don’t get fixed
Log gets flooded without a suggestion what to do

4. At start up all running dag_runs are being checked, we seemed to have a lot of “left over” dag_runs (couple of thousand)
- Checking was logged to INFO -> requires a fsync for every log message making it very slow
- Checking would happen at every restart, but dag_runs’ states were not being updated
- These dag_runs would never er be marked anything else than running for some reason
-> Applied work around to update all dag_run in sql before a certain date to -> finished
-> need to investigate why dag_runs did not get marked “finished/failed” 

5. Our umask is set to 027, but scheduler logging directories were created 777
- Cannot reproduce this locally, so we need to investigate.

6. Scanning the DAG dir only every 5 minutes by default seems very slow in more “dev/prod” mixed environments
-> Default should be set lower (30s) with best practice for prod environments set to maybe 300s


That’s it for now. Nothing really a show stopper I guess, but #4 is something we need to take care of. Rest can be fixed with small updates or good documentation.

Will release Beta 2 today, that will contain the major feature of cgroups+impersonation, but will not contain fixes yet for the above.

Bolke