You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Ravi Agarwal (JIRA)" <ji...@apache.org> on 2019/04/15 06:08:00 UTC

[jira] [Created] (AIRFLOW-4315) Improve Airflow's Experimantal API for all kind of monitoring requirements

Ravi Agarwal created AIRFLOW-4315:
-------------------------------------

             Summary: Improve Airflow's Experimantal API for all kind of monitoring requirements
                 Key: AIRFLOW-4315
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4315
             Project: Apache Airflow
          Issue Type: Improvement
          Components: api
    Affects Versions: 1.10.3
            Reporter: Ravi Agarwal
            Assignee: Ravi Agarwal
             Fix For: 1.10.4


*Goal*

 

We want to contribute to Airflow’s experimental APIs by extending the set of endpoints. to enable monitoring for the DAGs. In order to achieve this, we would like airflow to have the following capabilities



A) Get the list of all DAGs in airflow

B) Get the details for a particular DAG identified by a dag_id

C) Get list of tasks for a dag with their sequence details for a particular DAG identified by a dag_id
D) Get details for a task identified by dag_id and task_id.

Historic Results for DAG Run:
E) Get details of all dag_runs for a particular DAG identified by a dag_id

F) Get details of a specific dag_run for a particular DAG identified by dag_id and execution_date

G) Get details of all dag_runs filtered by various parameters like state, execution time, execution interval etc.


Historic Results for Task Instance:
-H) Get list of task_instances for a dag_run of a particular DAG identified by a dag_id and execution_date-
I) Get details of a task_instance for a dag_run of a DAG identified by dag_id, execution_date and task_id

J) Get details of all task_instances filtered by various parameters like state, execution interval etc.


Logs Monitoring:
K) Get logs pertaining to a particular task_instance identified by dag_id, execution_date and task_id
h1. What already exists in Airflow's experimental API and changes which would improve the usability:

 

E1.

    GET /api/experimental/dags/<DAG_ID>/dag_runs

    GET /api/experimental/dags/<DAG_ID>/dag_runs?state=

            Returns a list of Dag Runs for a specific DAG ID.

 

This satisfies the requirement ‘D’, but it will be good to add more filters to this endpoint, like being able to filter dag_runs that were run within a given time interval, or before given time or after a given time, or with states NOT equal to the given state.

 

    Proposal -


        GET /api/experimental/dags/<DAG_ID>/dag_runs?state_not_equal=    

        GET /api/experimental/dags/<DAG_ID>/dag_runs?execution_before=    

        GET /api/experimental/dags/<DAG_ID>/dag_runs?execution_after=    





E2.

    GET /api/experimental/dags/<dag_id>/dag_runs/<execution_date>

Returns a JSON with a dag_run’s public instance variables. The format for the <execution_date> is expected to be “YYYY-mm-DDTHH:MM:SS”, for example: “2016-11-16T11:34:15”.

 

This endpoint is a good candidate to satisfy the requirement ‘E’, but it returns nothing but the state of the identified dag_run. This should have a lot more details about the dag_run than just state.

 

    Proposal :

 

Modify the response object of this endpoint to return same details as [/dags/<DAG_ID>/dag_runs] returns for each object in it's list

 

E3.

    GET /api/experimental/dags/<DAG_ID>/tasks/<TASK_ID>    

Returns info for a task.

 

This endpoint satisfies the requirement ‘K’, and can be improved by adding details as to which tasks are upstream and downstream to the identified one and also the information regarding the operator type would be useful in the response.

 

    Proposal :

       

        Add operator type, list of upstream tasks & list of downstream tasks to the response object to increase usability.

 

E4.

    GET /api/experimental/dags/<DAG_ID>/dag_runs/<execution_date>/tasks/<TASK_ID>       

Returns a JSON with a task instance’s public instance variables. The format for the <execution_date> is expected to be “YYYY-mm-DDTHH:MM:SS”, for example: “2016-11-16T11:34:15”.

 

This endpoint satisfies the requirement ‘G’, and can be improved by adding details as to which tasks were executed upstream and downstream. Also, it can be useful from a monitoring perspective, to also return the number of attempts already made, if running, or number of attempts made in total if failed/successful for a given task.

 

    Proposal :

       

Add list of upstream task_instances, list of downstream task_instances, and number of attempts to the response object to increase usability.

 

E5.

    GET /api/experimental/latest_runs

Returns the latest DagRun for each DAG formatted for the UI.

 

    This endpoint satisfies the requirement ‘F’ partially.





h1. Proposed New Endpoints to Airflow API:

 

N1.

    GET /api/experimental/dags                

    GET /api/experimental/dags?is_paused=

        Return a list of all available dags, can also be filtered on the basis of pause state of DAGS.  

 

    To satisfy requirement ‘A’.

       

N2.

    GET /api/experimental/dags/<DAG_ID>

        Return information for a specific DAG

 

    To satisfy requirement ‘B’.

 

N3.

    GET /api/experimental/dags/<DAG_ID/tasks         

        Return a list of all tasks part of the DAG_ID

    

    To satisfy requirement ‘C’.

 

-N4.-

    -GET /api/experimental/dags/<DAG_ID>/dag_runs/<execution_date>/tasks-

    -    Return a list of all task_instances part of the DAG_ID's particular dag_run-

 

    -To satisfy requirement ‘H’.-

    


N5.

    GET /api/experimental/dag_runs

    GET /api/experimental/dag_runs?state=

    GET /api/experimental/dag_runs?state_not_equal=

    GET /api/experimental/dag_runs?execution_date_before=

    GET /api/experimental/dag_runs?execution_date_after=

    GET /api/experimental/dag_runs?dag_id=

 

    To satisfy requirement ‘F’. This endpoint will act as a generalized filter to search for dag_runs.

 

N6.

    GET /api/experimental/task_instances

    GET /api/experimental/task_instances?state=

    GET /api/experimental/task_instances?state_not_equal=

    GET /api/experimental/task_instances?execution_date_before=

    GET /api/experimental/task_instances?execution_date_after=

    GET /api/experimental/task_instances?dag_id=

    GET /api/experimental/task_instances?task_id=

 

    To satisfy requirement ‘I’. This endpoint will act as a generalized filter to search for task_instances.

 

N7.

 

    GET /api/experimental/dags/<DAG_ID>/dag_runs/<execution_date>/tasks/<TASK_ID>/logs?page=x

    GET /api/experimental/dags/<DAG_ID>/dag_runs/<execution_date>/tasks/<TASK_ID>/logs

    

    To satisfy requirement ‘J’.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)