You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/09/02 18:01:02 UTC

[jira] [Commented] (AIRFLOW-2642) [kubernetes executor worker] the value of git-sync init container ENV GIT_SYNC_ROOT is wrong

    [ https://issues.apache.org/jira/browse/AIRFLOW-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601393#comment-16601393 ] 

Apache Spark commented on AIRFLOW-2642:
---------------------------------------

User 'Cplo' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3519

> [kubernetes executor worker] the value of git-sync init container ENV GIT_SYNC_ROOT is wrong
> --------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-2642
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2642
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: contrib
>    Affects Versions: 2.0.0, 1.10
>            Reporter: pengchen
>            Assignee: pengchen
>            Priority: Major
>             Fix For: 1.10
>
>
> There are two way of syncing dags, pvc and git-sync. When we use git-sync this way, the generated worker pod yaml file fragment is as follows
>  
> {code:java}
> worker container:
> -------------------------------
> containers:
> - args:
> - airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local -sd
> /root/airflow/dags/dags/example_dags/tutorial1.py
> command:
> - bash
> - -cx
> - --
> env:
> - name: AIRFLOW__CORE__AIRFLOW_HOME
> value: /root/airflow
> - name: AIRFLOW__CORE__EXECUTOR
> value: LocalExecutor
> - name: AIRFLOW__CORE__DAGS_FOLDER
> value: /tmp/dags
> - name: SQL_ALCHEMY_CONN
> valueFrom:
> secretKeyRef:
> key: sql_alchemy_conn
> name: airflow-secrets
> init container:
> -------------------------------
> initContainers:
> - env:
> - name: GIT_SYNC_REPO
> value: https://code.devops.xiaohongshu.com/pengchen/Airflow-DAGs.git
> - name: GIT_SYNC_BRANCH
> value: master
> - name: GIT_SYNC_ROOT
> value: /tmp
> - name: GIT_SYNC_DEST
> value: dags
> - name: GIT_SYNC_ONE_TIME
> value: "true"
> - name: GIT_SYNC_USERNAME
> value: XXX
> - name: GIT_SYNC_PASSWORD
> value: XXX
> image: library/git-sync-amd64:v2.0.5
> imagePullPolicy: IfNotPresent
> name: git-sync-clone
> resources: {}
> securityContext:
> runAsUser: 0
> terminationMessagePath: /dev/termination-log
> terminationMessagePolicy: File
> volumeMounts:
> - mountPath: /root/airflow/dags/
> name: airflow-dags
> - mountPath: /root/airflow/logs
> name: airflow-logs
> - mountPath: /root/airflow/airflow.cfg
> name: airflow-config
> readOnly: true
> subPath: airflow.cfg
> - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
> name: default-token-xz87t
> readOnly: true
> {code}
> According to the configuration, git-sync will synchronize dags to /tmp/dags directory. However the worker container command args(airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local -sd
>  /root/airflow/dags/dags/example_dags/tutorial1.py) are generated by the scheduler. Therefore, the task error is as follows
> {code:java}
> + airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local -sd /root/airflow/dags/dags/example_dags/tutorial1.py
> [2018-06-19 07:57:29,075] {settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800
> [2018-06-19 07:57:29,232] {__init__.py:51} INFO - Using executor LocalExecutor
> [2018-06-19 07:57:29,373] {models.py:219} INFO - Filling up the DagBag from /root/airflow/dags/dags/example_dags/tutorial1.py
> [2018-06-19 07:57:29,648] {models.py:310} INFO - File /usr/local/lib/python2.7/dist-packages/airflow/example_dags/__init__.py assumed to contain no DAGs. Skipping.
> Traceback (most recent call last):
> File "/usr/local/bin/airflow", line 32, in <module>
> args.func(args)
> File "/usr/local/lib/python2.7/dist-packages/airflow/utils/cli.py", line 74, in wrapper
> return f(*args, **kwargs)
> File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 475, in run
> dag = get_dag(args)
> File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 146, in get_dag
> 'parse.'.format(args.dag_id))
> airflow.exceptions.AirflowException: dag_id could not be found: tutorial1. Either the dag did not exist or it failed to parse.
> {code}
>  
> The log shows that the worker cannot find the corresponding dag, so I think the environment variable GIT_SYNC_ROOT should be consistent with dag_volume_mount_path.  
> The worker's environment variable AIRFLOW__CORE__DAGS_FOLDER is invalid, and AIRFLOW__CORE__EXECUTOR is also invalid
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)