You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/01/13 19:52:20 UTC
[GitHub] [airflow] tooptoop4 opened a new pull request #7157: [AIRFLOW-6251]
add config for max tasks per dag
tooptoop4 opened a new pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157
---
Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
Make sure to mark the boxes below before creating PR: [x]
- [x] Description above provides context of the change
- [x] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN = JIRA ID<sup>*</sup>
- [x] Unit tests coverage for changes (not needed for documentation changes)
- [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
- [x] Relevant documentation is updated including usage instructions.
- [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
<sup>*</sup> For document-only changes commit message can start with `[AIRFLOW-XXXX]`.
---
In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157:
[AIRFLOW-6251] add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366360841
##########
File path: airflow/models/dagbag.py
##########
@@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
dag.full_filepath = filepath
if dag.fileloc != filepath and not is_zipfile:
dag.fileloc = filepath
- try:
- dag.is_subdag = False
- self.bag_dag(dag, parent_dag=dag, root_dag=dag)
- if isinstance(dag._schedule_interval, str):
- croniter(dag._schedule_interval)
- found_dags.append(dag)
- found_dags += dag.subdags
- except (CroniterBadCronError,
- CroniterBadDateError,
- CroniterNotAlphaError) as cron_e:
- self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+ num_tasks = len(dag.tasks)
+ if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
Review comment:
Really? but I remember we have a PR to do that, I will task a look
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157:
[AIRFLOW-6251] add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366661407
##########
File path: tests/models/test_dagbag.py
##########
@@ -151,6 +151,42 @@ def test_zip(self):
dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip"))
self.assertTrue(dagbag.get_dag("test_zip_dag"))
+ @conf_vars({('core', 'max_tasks_per_dag'): '5'})
+ def test_process_file_max_task_check(self):
+ """
+ test if num_tasks > max_tasks_per_dag can be identified
+ """
+ a_dag_id = "example_short_circuit_operator"
Review comment:
```suggestion
a_dag_id = "test_example_bash_operator"
```
I take a quick look. `TEST_DAGS_FOLDER` point to `airflow/tests/models/../dags` and this path not including dag `example_short_circuit_operator`. So you could not process_file `airflow/tests/models/../dags/example_short_circuit_operator.py`
Maybe you should use others dag in `airflow/tests/models/../dags`(which `TEST_DAGS_FOLDER` point to), just like `test_example_bash_operator` and change `@conf_vars({('core', 'max_tasks_per_dag'): '5'})` to `@conf_vars({('core', 'max_tasks_per_dag'): '7'})`.
So as others test
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] kaxil commented on issue #7157: [AIRFLOW-6251] add config
for max tasks per dag
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-581383090
> @kaxil WDYT
I agree with @xinbinhuang that we might not need this setting after all ! Although that being said, create a post on dev mailing list and see if there is any interest in this feature, if so we can think about it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574886874
# [Codecov](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=h1) Report
> :exclamation: No coverage uploaded for pull request base (`master@92521aa`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
> The diff coverage is `86.66%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7157/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #7157 +/- ##
========================================
Coverage ? 85.3%
========================================
Files ? 723
Lines ? 40701
Branches ? 0
========================================
Hits ? 34720
Misses ? 5981
Partials ? 0
```
| [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [airflow/models/pool.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvcG9vbC5weQ==) | `97.36% <ø> (ø)` | |
| [airflow/contrib/operators/databricks\_operator.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9kYXRhYnJpY2tzX29wZXJhdG9yLnB5) | `92.24% <ø> (ø)` | |
| [.../providers/apache/spark/hooks/spark\_jdbc\_script.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL3NwYXJrL2hvb2tzL3NwYXJrX2pkYmNfc2NyaXB0LnB5) | `0% <ø> (ø)` | |
| [airflow/bin/cli.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `94.73% <ø> (ø)` | |
| [...ders/google/cloud/example\_dags/example\_dataproc.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2V4YW1wbGVfZGFncy9leGFtcGxlX2RhdGFwcm9jLnB5) | `0% <0%> (ø)` | |
| [airflow/macros/hive.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9tYWNyb3MvaGl2ZS5weQ==) | `38.7% <0%> (ø)` | |
| [airflow/utils/log/gcs\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZ2NzX3Rhc2tfaGFuZGxlci5weQ==) | `0% <0%> (ø)` | |
| [airflow/models/connection.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==) | `68.78% <0%> (ø)` | |
| [airflow/contrib/operators/vertica\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy92ZXJ0aWNhX3RvX2hpdmUucHk=) | `0% <0%> (ø)` | |
| [airflow/gcp/operators/bigquery.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9nY3Avb3BlcmF0b3JzL2JpZ3F1ZXJ5LnB5) | `91.59% <100%> (ø)` | |
| ... and [98 more](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=footer). Last update [92521aa...bf5d94d](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] xinbinhuang commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-576943577
I have a probably unwelcome comment. But do we really need to have this config? I feel like this is something that should be checked during CI/CD before it deploys to Airflow. It gives an extra overhead (though trivial) when parsing the file. Also, it makes a bit harder to read the code (which is already a bit hard to read)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] stale[bot] commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
stale[bot] commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-601150882
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157:
[AIRFLOW-6251] add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366661407
##########
File path: tests/models/test_dagbag.py
##########
@@ -151,6 +151,42 @@ def test_zip(self):
dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip"))
self.assertTrue(dagbag.get_dag("test_zip_dag"))
+ @conf_vars({('core', 'max_tasks_per_dag'): '5'})
+ def test_process_file_max_task_check(self):
+ """
+ test if num_tasks > max_tasks_per_dag can be identified
+ """
+ a_dag_id = "example_short_circuit_operator"
Review comment:
```suggestion
a_dag_id = "test_example_bash_operator"
```
I take a quick look. `TEST_DAGS_FOLDER` point to `airflow/tests/models/../dags` and this path not including dag `example_short_circuit_operator`. So you could not process_file `airflow/tests/models/../dags/example_short_circuit_operator.py`
Maybe you should use others dag in `airflow/tests/models/../dags`(which `TEST_DAGS_FOLDER` point to), just like `test_example_bash_operator` and change `@conf_vars({('core', 'max_tasks_per_dag'): '5'})` to `@conf_vars({('core', 'max_tasks_per_dag'): '7'})`.
So as others test @tooptoop4
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157:
[AIRFLOW-6251] add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366241733
##########
File path: airflow/models/dagbag.py
##########
@@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
dag.full_filepath = filepath
if dag.fileloc != filepath and not is_zipfile:
dag.fileloc = filepath
- try:
- dag.is_subdag = False
- self.bag_dag(dag, parent_dag=dag, root_dag=dag)
- if isinstance(dag._schedule_interval, str):
- croniter(dag._schedule_interval)
- found_dags.append(dag)
- found_dags += dag.subdags
- except (CroniterBadCronError,
- CroniterBadDateError,
- CroniterNotAlphaError) as cron_e:
- self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+ num_tasks = len(dag.tasks)
+ if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
Review comment:
```suggestion
if 0 < max_tasks_per_dag < num_tasks:
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-601195402
be gone stalebot
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157:
[AIRFLOW-6251] add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366360841
##########
File path: airflow/models/dagbag.py
##########
@@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
dag.full_filepath = filepath
if dag.fileloc != filepath and not is_zipfile:
dag.fileloc = filepath
- try:
- dag.is_subdag = False
- self.bag_dag(dag, parent_dag=dag, root_dag=dag)
- if isinstance(dag._schedule_interval, str):
- croniter(dag._schedule_interval)
- found_dags.append(dag)
- found_dags += dag.subdags
- except (CroniterBadCronError,
- CroniterBadDateError,
- CroniterNotAlphaError) as cron_e:
- self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+ num_tasks = len(dag.tasks)
+ if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
Review comment:
Really? but I remember we have a PR to do that, I will take a look
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157:
[AIRFLOW-6251] add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366363132
##########
File path: airflow/models/dagbag.py
##########
@@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
dag.full_filepath = filepath
if dag.fileloc != filepath and not is_zipfile:
dag.fileloc = filepath
- try:
- dag.is_subdag = False
- self.bag_dag(dag, parent_dag=dag, root_dag=dag)
- if isinstance(dag._schedule_interval, str):
- croniter(dag._schedule_interval)
- found_dags.append(dag)
- found_dags += dag.subdags
- except (CroniterBadCronError,
- CroniterBadDateError,
- CroniterNotAlphaError) as cron_e:
- self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+ num_tasks = len(dag.tasks)
+ if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
Review comment:
@ashb After some quick look, in
https://github.com/apache/airflow/blob/4a344f13d26ecbb627bb9968895b290bfd86e4da/airflow/cli/commands/webserver_command.py#L90
And it work in
```py
In [1]: x = 0
In [2]: num_task = 10
In [3]: 0 < x < num_task
Out[3]: False
In [4]: 0 <= x < num_task
Out[4]: True
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] tooptoop4 commented on a change in pull request #7157:
[AIRFLOW-6251] add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366445111
##########
File path: airflow/models/dagbag.py
##########
@@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
dag.full_filepath = filepath
if dag.fileloc != filepath and not is_zipfile:
dag.fileloc = filepath
- try:
- dag.is_subdag = False
- self.bag_dag(dag, parent_dag=dag, root_dag=dag)
- if isinstance(dag._schedule_interval, str):
- croniter(dag._schedule_interval)
- found_dags.append(dag)
- found_dags += dag.subdags
- except (CroniterBadCronError,
- CroniterBadDateError,
- CroniterNotAlphaError) as cron_e:
- self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+ num_tasks = len(dag.tasks)
+ if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
Review comment:
confusion and need to test it proves the shorthand syntax is not clear, readable code
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] xinbinhuang commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-577052428
You can check the task count in the CI/CD process after deploying to source control similar to other DAG validation tests that you may have ([relevant blog post](https://blog.usejournal.com/testing-in-airflow-part-1-dag-validation-tests-dag-definition-tests-and-unit-tests-2aa94970570c) & [gist](https://gist.github.com/criccomini/2862667822af7fae8b55682faef029a7). A simple example:
```python
import unittest
from airflow.models import DagBag
class TestTaskCountPerDag(unittest.TestCase):
"""
Test the if the number of tasks per dag exceed the threshold
"""
MAX_TASKS_PER_DAG = 3
def _get_dagbag(self):
dag_folder = os.getenv('AIRFLOW_DAGS', False)
self.assertTrue(
dag_folder,
'AIRFLOW_DAGS must be set to a folder that has DAGs in it.')
return DagBag(dag_folder=dag_folder, include_examples=False)
def test_max_tasks_per_dag(self):
"""
Verify that tasks per dag do not exceed the threshold
"""
dagbag = self._get_dagbag()
dags = dagbag.dags
for dag_name, dag in dags.items():
self.assertLessEqual(dag.task_count,
MAX_TASKS_PER_DAG,
msg=('Numer of tasks for DAG {} is greater than ' +
'the threshold {}').format(dag_name, MAX_TASKS_PER_DAG)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] tooptoop4 removed a comment on issue #7157:
[AIRFLOW-6251] add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
tooptoop4 removed a comment on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574418492
@zhongjiajie any idea on why test not working?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-578975631
@kaxil WDYT
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] zhongjiajie commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574216275
> > If want to upload huge dag, should we set default value to not unlimit? BTW, should we add some doc to describe it?
>
> existing behaviour is unlimited, i want this to not break existing users, so it is 'opt-in'
I prefer to add use fix value for it and change the `UPDATING.md`
If we decide to use unlimited, at least we should add some document to remind users.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157:
[AIRFLOW-6251] add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366660224
##########
File path: airflow/models/dagbag.py
##########
@@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
dag.full_filepath = filepath
if dag.fileloc != filepath and not is_zipfile:
dag.fileloc = filepath
- try:
- dag.is_subdag = False
- self.bag_dag(dag, parent_dag=dag, root_dag=dag)
- if isinstance(dag._schedule_interval, str):
- croniter(dag._schedule_interval)
- found_dags.append(dag)
- found_dags += dag.subdags
- except (CroniterBadCronError,
- CroniterBadDateError,
- CroniterNotAlphaError) as cron_e:
- self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+ num_tasks = len(dag.tasks)
+ if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
Review comment:
But code base already have some example and IDE like pycharm will give a hint change to `var1 < var2 < var3`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-577039208
@xinbinhuang a dynamic dag may generate tasks at runtime based off a list of files in a 3rd party server so no way for cicd to trap it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157:
[AIRFLOW-6251] add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366363132
##########
File path: airflow/models/dagbag.py
##########
@@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
dag.full_filepath = filepath
if dag.fileloc != filepath and not is_zipfile:
dag.fileloc = filepath
- try:
- dag.is_subdag = False
- self.bag_dag(dag, parent_dag=dag, root_dag=dag)
- if isinstance(dag._schedule_interval, str):
- croniter(dag._schedule_interval)
- found_dags.append(dag)
- found_dags += dag.subdags
- except (CroniterBadCronError,
- CroniterBadDateError,
- CroniterNotAlphaError) as cron_e:
- self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+ num_tasks = len(dag.tasks)
+ if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
Review comment:
After some quick look, in
https://github.com/apache/airflow/blob/4a344f13d26ecbb627bb9968895b290bfd86e4da/airflow/cli/commands/webserver_command.py#L90
And it work in
```py
In [1]: x = 0
In [2]: num_task = 10
In [3]: 0 < x < num_task
Out[3]: False
In [4]: 0 <= x < num_task
Out[4]: True
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574418492
@zhongjiajie any idea on why test not working?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574213590
> If want to upload huge dag, should we set default value to not unlimit? BTW, should we add some doc to describe it?
existing behaviour is unlimited, i want this to not break existing users, so it is 'opt-in'
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-575794455
travis happy
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] tooptoop4 edited a comment on issue #7157: [AIRFLOW-6251]
add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-577039208
@xinbinhuang a dynamic dag may generate tasks at runtime based off a list of files in a 3rd party server (ie an sftp drive to be downloaded from and its csvs ingested) so no way for cicd to trap it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #7157:
[AIRFLOW-6251] add config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366303058
##########
File path: airflow/models/dagbag.py
##########
@@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
dag.full_filepath = filepath
if dag.fileloc != filepath and not is_zipfile:
dag.fileloc = filepath
- try:
- dag.is_subdag = False
- self.bag_dag(dag, parent_dag=dag, root_dag=dag)
- if isinstance(dag._schedule_interval, str):
- croniter(dag._schedule_interval)
- found_dags.append(dag)
- found_dags += dag.subdags
- except (CroniterBadCronError,
- CroniterBadDateError,
- CroniterNotAlphaError) as cron_e:
- self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+ num_tasks = len(dag.tasks)
+ if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
Review comment:
That doens't wotk @zhongjiajie
```
In [1]: x = 0
In [2]: num_task = 10
In [3]: 0 < x < num_task
Out[3]: False
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [airflow] zhongjiajie commented on issue #7157: [AIRFLOW-6251] add
config for max tasks per dag
Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574548195
FYI, Travis-CI still sad, this time is timeout, may maybe you should test int you local breeze using `pytest -vs tests/cli/commands/test_task_command.py` for more detail
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services