You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/01/20 10:48:34 UTC

[GitHub] [airflow] lokeshlal opened a new pull request #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

lokeshlal opened a new pull request #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216
 
 
   ** This is just an example to present the approach to solve AIRFLOW-4540/AIRFLOW-574 
   ** I have only done code for Graph view in this example, no test cases has been implemented
   ** Once reviewers gives a go-ahead on the approach, I will implement for the rest of the views as well as add test cases and migration script
   
   Dag should be rendered on graph view and other views as per dag runs rather than the stored dag serialized information or dag file.
   
   ---
   Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [X] Description above provides context of the change
   - [X] Commit message/PR title starts with `[AIRFLOW-4540]`. AIRFLOW-NNNN = JIRA ID<sup>*</sup>
   - [ ] Unit tests coverage for changes (not needed for documentation changes)
   - [X] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [ ] Relevant documentation is updated including usage instructions.
   - [X] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   <sup>*</sup> For document-only changes commit message can start with `[AIRFLOW-XXXX]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] lokeshlal commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

Posted by GitBox <gi...@apache.org>.
lokeshlal commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216#issuecomment-576255819
 
 
   Thank you for the response. 
   However, I am talking about a case where a single dag can take many forms at the same time. For example, consider there is a need to run ETL job for 2 different customers (both will have data in s3 buckets and first customer have data in 2 buckets (s3_b1_customer_1 and s3_b2_customer_1) and second customer have data in 3 buckets (s3_b1_customer_2, s3_b2_customer_2 and s3_b3_customer_2). Then in that case DAG job for first customer will trigger 2 parallel tasks for 2 buckets and DAG for second customer will trigger 3 tasks in parallel for 3 buckets.
   So view for both the DAGs should represent correct number of tasks within dag. 
   Any thoughts about above scenario. 
   Thank you.
   Lokesh

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] lokeshlal commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

Posted by GitBox <gi...@apache.org>.
lokeshlal commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216#issuecomment-576218537
 
 
   Just realised Graph view will require 2 views.. one for the dag run and one when there is no dag run... that behavior will stay as is...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] lokeshlal commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

Posted by GitBox <gi...@apache.org>.
lokeshlal commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216#issuecomment-576235739
 
 
   @kaxil @potiuk could you please provide your inputs on the approach. Thank you.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] lokeshlal closed pull request #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

Posted by GitBox <gi...@apache.org>.
lokeshlal closed pull request #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] lokeshlal commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

Posted by GitBox <gi...@apache.org>.
lokeshlal commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216#issuecomment-576329017
 
 
   @kaxil @potiuk thank you guys for taking time to explain things. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] lokeshlal commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

Posted by GitBox <gi...@apache.org>.
lokeshlal commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216#issuecomment-576326476
 
 
   @potiuk thank you... I got your point. I have never thought about this approach of creating multiple DAGs for different customers. Provides clear visibility to operations team as well. 
   Another quick question, so different versions of serialised DAGs at different time will be aligned to the dagrun execution time?
   Thanks again.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] kaxil commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216#issuecomment-576237040
 
 
   Good work @lokeshlal . I was planning to handle it as a part of DAG Serialization where we would store previous X versions of the Serialized DAGs in the serialized_dag table. 
   
   That would help us in cases where tasks were added or removed in previous representations too. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] kaxil edited a comment on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216#issuecomment-576328373
 
 
   So my plan was to add this to DagRuns and not tasks. As tasks can change. We would associate a particular serialized Dag version with a DagRun. Hence in your case the DagRuns for both customers would be different.
   
   And @potiuk has correctly covered a good solution to your problem too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] kaxil commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216#issuecomment-576328373
 
 
   So my plan was to add this to DagRuns and not tasks. As tasks can change. We would associate a particular serialized Dag version with a DagRun. Hence in your case the DagRuns for both customers would be different

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216#issuecomment-576322320
 
 
   @lokeshlal  - I think the scenario above is a clear case why you would like to have 2 different DAGs for those two different customers. 
   
   It might look kinda duplication but it is not. That's the power of Airflow that builds on the power of python - you can use the power of python and make DAG definition and parameterisation in common import file and instantiate two different DAGs for two different customers with different DAG ids. 
   
   This is how it should be solved, not having completely different set of tasks for different runs of the same DAG.  DAG structure - if changed - should evolve slowly. 
   
   I think there are many, many problems to solve if you would like to do it as you plan to do it now - I think the edge cases will be really really complex and I'd rather wait for what @kaxil explained.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] kaxil edited a comment on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on issue #7216: [AIRFLOW-4540] Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure
URL: https://github.com/apache/airflow/pull/7216#issuecomment-576237040
 
 
   Good work @lokeshlal . I was planning to handle it as a part of DAG Serialization where we would store previous X versions of the Serialized DAGs in the serialized_dag table. 
   
   That would help us in cases where tasks were added or removed in previous representations too. 
   
   This is explained briefly in https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler
   
   and I will be adding more details soon once I am back in the UK from India.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services