You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Alejandro Fernandez (JIRA)" <ji...@apache.org> on 2019/04/15 18:22:00 UTC

[jira] [Commented] (AIRFLOW-4315) Improve Airflow's Experimantal API for all kind of monitoring requirements

    [ https://issues.apache.org/jira/browse/AIRFLOW-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818248#comment-16818248 ] 

Alejandro Fernandez commented on AIRFLOW-4315:
----------------------------------------------

[~raviagarwal], thank you for this proposal, I agree it would help to expand the use-cases for the API.

> Improve Airflow's Experimantal API for all kind of monitoring requirements
> --------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4315
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4315
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: api
>    Affects Versions: 1.10.3
>            Reporter: Ravi Agarwal
>            Assignee: Ravi Agarwal
>            Priority: Major
>              Labels: features, ready-to-commit
>             Fix For: 1.10.4
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> *Goal*
>  
> We want to contribute to Airflow’s experimental APIs by extending the set of endpoints. to enable monitoring for the DAGs. In order to achieve this, we would like airflow to have the following capabilities
> A) Get the list of all DAGs in airflow
> B) Get the details for a particular DAG identified by a dag_id
> C) Get list of tasks for a dag with their sequence details for a particular DAG identified by a dag_id
>  D) Get details for a task identified by dag_id and task_id.
> Historic Results for DAG Run:
>  E) Get details of all dag_runs for a particular DAG identified by a dag_id
> F) Get details of a specific dag_run for a particular DAG identified by dag_id and execution_date
> G) Get details of all dag_runs filtered by various parameters like state, execution time, execution interval etc.
> Historic Results for Task Instance:
>  -H) Get list of task_instances for a dag_run of a particular DAG identified by a dag_id and execution_date-
>  I) Get details of a task_instance for a dag_run of a DAG identified by dag_id, execution_date and task_id
> J) Get details of all task_instances filtered by various parameters like state, execution interval etc.
> Logs Monitoring:
>  K) Get logs pertaining to a particular task_instance identified by dag_id, execution_date and task_id
> h1. What already exists in Airflow's experimental API and changes which would improve the usability:
>  
> E1.
>     GET /api/experimental/dags/<DAG_ID>/dag_runs
>     GET /api/experimental/dags/<DAG_ID>/dag_runs?state=
>            Returns a list of Dag Runs for a specific DAG ID.
>  
> This satisfies the requirement ‘D’, but it will be good to add more filters to this endpoint, like being able to filter dag_runs that were run within a given time interval, or before given time or after a given time, or with states NOT equal to the given state.
>  
>     Proposal -
>        GET /api/experimental/dags/<DAG_ID>/dag_runs?state_not_equal=    
>        GET /api/experimental/dags/<DAG_ID>/dag_runs?execution_before=   
>        GET /api/experimental/dags/<DAG_ID>/dag_runs?execution_after=   
> E2.
>     GET /api/experimental/dags/<dag_id>/dag_runs/<execution_date>
> Returns a JSON with a dag_run’s public instance variables. The format for the <execution_date> is expected to be “YYYY-mm-DDTHH:MM:SS”, for example: “2016-11-16T11:34:15”.
>  
> This endpoint is a good candidate to satisfy the requirement ‘E’, but it returns nothing but the state of the identified dag_run. This should have a lot more details about the dag_run than just state.
>  
>     Proposal :
>  
> Modify the response object of this endpoint to return same details as [/dags/<DAG_ID>/dag_runs] returns for each object in it's list
>  
> E3.
>     GET /api/experimental/dags/<DAG_ID>/tasks/<TASK_ID>   
> Returns info for a task.
>  
> This endpoint satisfies the requirement ‘K’, and can be improved by adding details as to which tasks are upstream and downstream to the identified one and also the information regarding the operator type would be useful in the response.
>  
>     Proposal :
>      
>        Add operator type, list of upstream tasks & list of downstream tasks to the response object to increase usability.
>  
> E4.
>     GET /api/experimental/dags/<DAG_ID>/dag_runs/<execution_date>/tasks/<TASK_ID>      
> Returns a JSON with a task instance’s public instance variables. The format for the <execution_date> is expected to be “YYYY-mm-DDTHH:MM:SS”, for example: “2016-11-16T11:34:15”.
>  
> This endpoint satisfies the requirement ‘G’, and can be improved by adding details as to which tasks were executed upstream and downstream. Also, it can be useful from a monitoring perspective, to also return the number of attempts already made, if running, or number of attempts made in total if failed/successful for a given task.
>  
>     Proposal :
>      
> Add list of upstream task_instances, list of downstream task_instances, and number of attempts to the response object to increase usability.
>  
> E5.
>     GET /api/experimental/latest_runs
> Returns the latest DagRun for each DAG formatted for the UI.
>  
>     This endpoint satisfies the requirement ‘F’ partially.
> h1. Proposed New Endpoints to Airflow API:
>  
> N1.
>     GET /api/experimental/dags               
>     GET /api/experimental/dags?is_paused=
>        Return a list of all available dags, can also be filtered on the basis of pause state of DAGS.  
>  
>     To satisfy requirement ‘A’.
>      
> N2.
>     GET /api/experimental/dags/<DAG_ID>
>        Return information for a specific DAG
>  
>     To satisfy requirement ‘B’.
>  
> N3.
>     GET /api/experimental/dags/<DAG_ID/tasks        
>        Return a list of all tasks part of the DAG_ID
>    
>     To satisfy requirement ‘C’.
>  
> -N4.-
>     -GET /api/experimental/dags/<DAG_ID>/dag_runs/<execution_date>/tasks-
>        -Return a list of all task_instances part of the DAG_ID's particular dag_run-
>  
>     -To satisfy requirement ‘H’.-
>   
> N5.
>     GET /api/experimental/dag_runs
>     GET /api/experimental/dag_runs?state=
>     GET /api/experimental/dag_runs?state_not_equal=
>     GET /api/experimental/dag_runs?execution_date_before=
>     GET /api/experimental/dag_runs?execution_date_after=
>     GET /api/experimental/dag_runs?dag_id=
>  
>     To satisfy requirement ‘F’. This endpoint will act as a generalized filter to search for dag_runs.
>  
> N6.
>     GET /api/experimental/task_instances
>     GET /api/experimental/task_instances?state=
>     GET /api/experimental/task_instances?state_not_equal=
>     GET /api/experimental/task_instances?execution_date_before=
>     GET /api/experimental/task_instances?execution_date_after=
>     GET /api/experimental/task_instances?dag_id=
>     GET /api/experimental/task_instances?task_id=
>  
>     To satisfy requirement ‘I’. This endpoint will act as a generalized filter to search for task_instances.
>  
> N7.
>  
>     GET /api/experimental/dags/<DAG_ID>/dag_runs/<execution_date>/tasks/<TASK_ID>/logs
>    
>     To satisfy requirement ‘J’.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)