You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Brian Runk (JIRA)" <ji...@apache.org> on 2018/03/01 22:30:00 UTC

[jira] [Created] (AIRFLOW-2164) Allow different DAG path on celery worker hosts

Brian Runk created AIRFLOW-2164:
-----------------------------------

             Summary: Allow different DAG path on celery worker hosts
                 Key: AIRFLOW-2164
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2164
             Project: Apache Airflow
          Issue Type: Improvement
          Components: celery
    Affects Versions: Airflow 1.8
         Environment: I'm running airflow 1.8 with celery 4.10 and librabbitmq-1.6.1(rabbitmq-3.7.3) on suse es 11 

$ python --version
Python 2.7.14 :: Anaconda, Inc.

$ uname -a
Linux <hostname> 3.0.101-0.47.106.5-default #1 SMP Thu Aug 31 20:12:46 UTC 2017 (c4febfa) x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/SuSE-release 
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3

$ pip list
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
airflow (1.8.0)
alembic (0.8.10)
amqp (2.2.2)
asn1crypto (0.24.0)
Babel (2.5.3)
backports-abc (0.5)
billiard (3.5.0.3)
celery (4.1.0)
certifi (2018.1.18)
cffi (1.11.4)
chardet (3.0.4)
click (6.7)
conda (4.4.10)
croniter (0.3.20)
cryptography (2.1.4)
dill (0.2.7.1)
docutils (0.14)
enum34 (1.1.6)
Flask (0.11.1)
Flask-Admin (1.4.1)
Flask-Cache (0.13.1)
Flask-Login (0.2.11)
flask-swagger (0.2.13)
Flask-WTF (0.12)
flower (0.9.2)
funcsigs (1.0.0)
future (0.15.2)
futures (3.2.0)
gitdb2 (2.0.3)
GitPython (2.1.8)
gunicorn (19.3.0)
idna (2.6)
ipaddress (1.0.19)
itsdangerous (0.24)
Jinja2 (2.8.1)
kombu (4.1.0)
librabbitmq (1.6.1)
lockfile (0.12.2)
lxml (3.8.0)
Mako (1.0.7)
Markdown (2.6.11)
MarkupSafe (1.0)
MySQL-python (1.2.5)
mysqlclient (1.3.12)
numpy (1.14.1)
ordereddict (1.1)
pandas (0.22.0)
pip (9.0.1)
psutil (4.4.2)
pycosat (0.6.3)
pycparser (2.18)
Pygments (2.2.0)
pyOpenSSL (17.5.0)
PySocks (1.6.7)
python-daemon (2.1.2)
python-dateutil (2.6.1)
python-editor (1.0.3)
python-nvd3 (0.14.2)
python-slugify (1.1.4)
pytz (2018.3)
PyYAML (3.12)
requests (2.18.4)
ruamel-yaml (0.15.35)
setproctitle (1.1.10)
setuptools (38.4.0)
singledispatch (3.4.0.3)
six (1.11.0)
smmap2 (2.0.3)
SQLAlchemy (1.2.4)
tabulate (0.7.7)
thrift (0.9.3)
tornado (4.5.3)
Unidecode (1.0.22)
urllib3 (1.22)
vine (1.1.4)
Werkzeug (0.14.1)
wheel (0.30.0)
WTForms (2.1)
zope.deprecation (4.3.0)

            Reporter: Brian Runk
             Fix For: Airflow 1.8


I cannot have the same path to the AIRFLOW_HOME/dags on worker host as on my master host because I don't have root and the disks are different on my hosts. Also I have not at any point had root, so this is a non-root install of everything.

My tasks get scheduled but don't run when running on a non-default queue (i discovered this trying to use a host-specific queue). I think i tracked it down to the DagBag path getting passed from the master host that doesn't exist on the worker host, i.e.

[2018-03-01 16:53:26,714] {models.py:167} INFO - Filling up the DagBag from /data/algodev/runkbri/home/airflow_data/dags/runktest.py

I tried runningĀ from the command line like

$ airflow run runk_tutorial runk_date 2018-03-01T15:23:14.233926 --local -sd /data/algodev/runkbri/home/airflow_data/dags/runktest.py
[2018-03-01 17:13:05,174] {__init__.py:57} INFO - Using executor CeleryExecutor
Traceback (most recent call last):
  File "/data/test/runkbri/home/airflow-1.8/bin/airflow", line 28, in <module>
    args.func(args)
  File "/data/test/runkbri/home/airflow-1.8/lib/python2.7/site-packages/airflow/bin/cli.py", line 388, in run
    dag = get_dag(args)
  File "/data/test/runkbri/home/airflow-1.8/lib/python2.7/site-packages/airflow/bin/cli.py", line 126, in get_dag
    'parse.'.format(args.dag_id))
airflow.exceptions.AirflowException: dag_id could not be found: runk_tutorial. Either the dag did not exist or it failed to parse.

but if i change it to --ship_dag instead of -sd  /data/algodev/runkbri/home/airflow_data/dags/runktest.py, it works

$  airflow run runk_tutorial runk_templated 2018-03-01T15:23:14.233926 --local --ship_dag
[2018-03-01 17:14:31,116] {__init__.py:57} INFO - Using executor CeleryExecutor

Logging into: /data/test/runkbri/home/airflow_data/logs/runk_tutorial/runk_templated/2018-03-01T15:23:14.233926
(works with this config)

I tweaked models.py slightly to remove the -sd and add --ship_dag and it seems to work but I'm a little concerned i don't know other stuff I may have broken with this hack. 

# commented out this line
#        cmd.extend(["-sd", file_path]) if file_path else None
# added this line in it's place
        cmd.extend(["--ship_dag"]);





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)