You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/02/15 01:04:46 UTC

[GitHub] [airflow] MatthewRBruce opened a new pull request #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated

MatthewRBruce opened a new pull request #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated
URL: https://github.com/apache/airflow/pull/7424
 
 
   DAG serializations were previous deleted based on whether the
   DagFileProcessorManager had processed a particular python file.  This caused issues if DAGs were import via `globals()` via a different python file.  This
   changes the deletion behaviour to be based on the last time a DAG was processed by the
   scheduler instead.  
   
   This also moves cleaning up of stale DAGs from `SchedulerJob` to `DagFileProcessorManager`  to support long running schedulers
   
   ---
   Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [X] Description above provides context of the change
   - [X] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN = JIRA ID<sup>*</sup>
   - [X] Unit tests coverage for changes (not needed for documentation changes)
   - [X] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [X] Relevant documentation is updated including usage instructions.
   - [X] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   <sup>*</sup> For document-only changes commit message can start with `[AIRFLOW-XXXX]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] kaxil commented on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated
URL: https://github.com/apache/airflow/pull/7424#issuecomment-604696745
 
 
   Can you rebase on latest master please @MatthewRBruce 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated
URL: https://github.com/apache/airflow/pull/7424#issuecomment-586545461
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=h1) Report
   > Merging [#7424](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/5b0f5410312b2c83f277564926f2c7f049621391&el=desc) will **decrease** coverage by `22.40%`.
   > The diff coverage is `14.28%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7424/graphs/tree.svg?width=650&height=150&src=pr&token=WdLKlKHOAU)](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #7424       +/-   ##
   ===========================================
   - Coverage   87.17%   64.77%   -22.41%     
   ===========================================
     Files         933      932        -1     
     Lines       45342    45341        -1     
   ===========================================
   - Hits        39526    29368    -10158     
   - Misses       5816    15973    +10157     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==) | `12.53% <0.00%> (-79.61%)` | :arrow_down: |
   | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `25.37% <5.88%> (-62.58%)` | :arrow_down: |
   | [airflow/models/dagcode.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnY29kZS5weQ==) | `44.21% <40.00%> (-48.27%)` | :arrow_down: |
   | [airflow/models/serialized\_dag.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvc2VyaWFsaXplZF9kYWcucHk=) | `57.64% <50.00%> (-36.54%)` | :arrow_down: |
   | [airflow/hooks/S3\_hook.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9TM19ob29rLnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/hooks/pig\_hook.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9waWdfaG9vay5weQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/hooks/hdfs\_hook.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9oZGZzX2hvb2sucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/hooks/http\_hook.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9odHRwX2hvb2sucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/hooks/jdbc\_hook.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9qZGJjX2hvb2sucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [airflow/hooks/druid\_hook.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9kcnVpZF9ob29rLnB5) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [508 more](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=footer). Last update [8e89780...6bec069](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] MatthewRBruce commented on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated
URL: https://github.com/apache/airflow/pull/7424#issuecomment-605027141
 
 
   Sure thing, I wasn't sure if this would still be needed with all the other serialization changes going on

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] MatthewRBruce commented on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated
URL: https://github.com/apache/airflow/pull/7424#issuecomment-606012275
 
 
   @kaxil I've rebased it now - I changed how the `DagCode` entries get cleaned up to be based on whether there are any DAGs in the DB with matching `fileloc`s instead of based on what `file_paths` were found.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io commented on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated

Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated
URL: https://github.com/apache/airflow/pull/7424#issuecomment-586545461
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=h1) Report
   > Merging [#7424](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/edcad79b8d1ee359586c843c6e685cc58d6abacf?src=pr&el=desc) will **decrease** coverage by `0.17%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7424/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7424      +/-   ##
   ==========================================
   - Coverage   86.56%   86.38%   -0.18%     
   ==========================================
     Files         878      878              
     Lines       41163    41171       +8     
   ==========================================
   - Hits        35632    35566      -66     
   - Misses       5531     5605      +74
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `89.07% <100%> (+1.14%)` | :arrow_up: |
   | [airflow/models/serialized\_dag.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvc2VyaWFsaXplZF9kYWcucHk=) | `92.59% <100%> (-0.1%)` | :arrow_down: |
   | [...w/providers/apache/hive/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL2hpdmUvb3BlcmF0b3JzL215c3FsX3RvX2hpdmUucHk=) | `100% <0%> (ø)` | :arrow_up: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | [airflow/security/kerberos.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==) | `76.08% <0%> (ø)` | :arrow_up: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `47.18% <0%> (-45.08%)` | :arrow_down: |
   | [airflow/providers/mysql/operators/mysql.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvbXlzcWwvb3BlcmF0b3JzL215c3FsLnB5) | `100% <0%> (ø)` | :arrow_up: |
   | [...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==) | `69.38% <0%> (-25.52%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | ... and [8 more](https://codecov.io/gh/apache/airflow/pull/7424/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=footer). Last update [edcad79...e339941](https://codecov.io/gh/apache/airflow/pull/7424?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] MatthewRBruce commented on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on issue #7424: [AIRFLOW-6796] Clean up DAG serializations based on last_updated
URL: https://github.com/apache/airflow/pull/7424#issuecomment-606870400
 
 
   It won't delete the DAG until the last time the scheduler processed that DAG has surpassed `dag_cleanup_interval`. So yes, if the the DAG file was deleted, it will take around `dag_cleanup_interval`s for it to be removed.  I don't believe the DAG being updated or tasks being added/delete will come into play.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services