You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "nathan warshauer (JIRA)" <ji...@apache.org> on 2017/11/29 22:48:01 UTC

[jira] [Created] (AIRFLOW-1868) Packaged Dags not added to dag table, unable to execute tasks

nathan warshauer created AIRFLOW-1868:
-----------------------------------------

             Summary: Packaged Dags not added to dag table, unable to execute tasks
                 Key: AIRFLOW-1868
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1868
             Project: Apache Airflow
          Issue Type: Bug
         Environment: airflow 1.8.2, celery, rabbitMQ, mySQL, aws
            Reporter: nathan warshauer
         Attachments: Screen Shot 2017-11-29 at 2.31.02 PM.png, Screen Shot 2017-11-29 at 4.40.39 PM.png, Screen Shot 2017-11-29 at 4.42.39 PM.png

.zip files in the dag directory do not appear to be getting added to the dag table on the airflow database.  When a .zip file is placed within the dags folder and it contains executable .py files, the dag_id should be added to the dag table and airflow should allow the dag to be unpaused and run through the web server.
SELECT distinct dag.dag_id AS dag_dag_id FROM dag confirms the dag does not exist in the dags table but shows up on the UI with the warning message "This Dag seems to be existing only locally" however the dag exists in all 3 dag directories (master and two workers) and the airflow.cfg has donot_pickle = True
When the dag is triggered manually via airflow trigger_dag <dag_id> the process goes to the web server and does not execute any tasks.  When I go to the task and click start through the UI the task will execute successfully and shows the attached state upon completion.  When I do not do this process the tasks will not enter the queue and the run sits idle as the 3rd attached image shows.
Basically, the dag CAN run manually from the zip BUT the scheduler and underlying database tables appear to not be functioning correctly for packaged dags.
Please let me know if I can provide any additional information regarding this issue, or if you all have any leads that I can check out for resolving this.

dag = DAG('MY-DAG-NAME', 
  default_args=default_args, 
  schedule_interval='*/5 * * * *',
  max_active_runs=1,
  dagrun_timeout=timedelta(minutes=4, seconds=30))

default_args = {
  'depends_on_past': False,
  'email': ['airflow@airflow.com'],
  'email_on_failure': True,
  'email_on_retry': False,
  'owner': 'airflow',
  'provide_context': True,
  'retries': 0,
  'retry_delay': timedelta(minutes=5),
  'start_date': datetime(2017,11,28)
}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)