You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/01/13 19:52:20 UTC

[GitHub] [airflow] tooptoop4 opened a new pull request #7157: [AIRFLOW-6251] add config for max tasks per dag

tooptoop4 opened a new pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157
 
 
   ---
   Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN = JIRA ID<sup>*</sup>
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   <sup>*</sup> For document-only changes commit message can start with `[AIRFLOW-XXXX]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366360841
 
 

 ##########
 File path: airflow/models/dagbag.py
 ##########
 @@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
                         dag.full_filepath = filepath
                         if dag.fileloc != filepath and not is_zipfile:
                             dag.fileloc = filepath
-                    try:
-                        dag.is_subdag = False
-                        self.bag_dag(dag, parent_dag=dag, root_dag=dag)
-                        if isinstance(dag._schedule_interval, str):
-                            croniter(dag._schedule_interval)
-                        found_dags.append(dag)
-                        found_dags += dag.subdags
-                    except (CroniterBadCronError,
-                            CroniterBadDateError,
-                            CroniterNotAlphaError) as cron_e:
-                        self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+                    num_tasks = len(dag.tasks)
+                    if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
 
 Review comment:
   Really? but I remember we have a PR to do that, I will task a look

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366661407
 
 

 ##########
 File path: tests/models/test_dagbag.py
 ##########
 @@ -151,6 +151,42 @@ def test_zip(self):
         dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip"))
         self.assertTrue(dagbag.get_dag("test_zip_dag"))
 
+    @conf_vars({('core', 'max_tasks_per_dag'): '5'})
+    def test_process_file_max_task_check(self):
+        """
+        test if num_tasks > max_tasks_per_dag can be identified
+        """
+        a_dag_id = "example_short_circuit_operator"
 
 Review comment:
   ```suggestion
           a_dag_id = "test_example_bash_operator"
   ```
   
   I take a quick look. `TEST_DAGS_FOLDER` point to `airflow/tests/models/../dags` and this path not including dag `example_short_circuit_operator`. So you could not process_file `airflow/tests/models/../dags/example_short_circuit_operator.py`
   
   Maybe you should use others dag in `airflow/tests/models/../dags`(which `TEST_DAGS_FOLDER` point to), just like `test_example_bash_operator` and change `@conf_vars({('core', 'max_tasks_per_dag'): '5'})` to `@conf_vars({('core', 'max_tasks_per_dag'): '7'})`.
   
   So as others test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] kaxil commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-581383090
 
 
   > @kaxil WDYT
   
   I agree with @xinbinhuang that we might not need this setting after all ! Although that being said, create a post on dev mailing list and see if there is any interest in this feature, if so we can think about it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574886874
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=h1) Report
   > :exclamation: No coverage uploaded for pull request base (`master@92521aa`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `86.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7157/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff            @@
   ##             master   #7157   +/-   ##
   ========================================
     Coverage          ?   85.3%           
   ========================================
     Files             ?     723           
     Lines             ?   40701           
     Branches          ?       0           
   ========================================
     Hits              ?   34720           
     Misses            ?    5981           
     Partials          ?       0
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/models/pool.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvcG9vbC5weQ==) | `97.36% <ø> (ø)` | |
   | [airflow/contrib/operators/databricks\_operator.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9kYXRhYnJpY2tzX29wZXJhdG9yLnB5) | `92.24% <ø> (ø)` | |
   | [.../providers/apache/spark/hooks/spark\_jdbc\_script.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYXBhY2hlL3NwYXJrL2hvb2tzL3NwYXJrX2pkYmNfc2NyaXB0LnB5) | `0% <ø> (ø)` | |
   | [airflow/bin/cli.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `94.73% <ø> (ø)` | |
   | [...ders/google/cloud/example\_dags/example\_dataproc.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL2V4YW1wbGVfZGFncy9leGFtcGxlX2RhdGFwcm9jLnB5) | `0% <0%> (ø)` | |
   | [airflow/macros/hive.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9tYWNyb3MvaGl2ZS5weQ==) | `38.7% <0%> (ø)` | |
   | [airflow/utils/log/gcs\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZ2NzX3Rhc2tfaGFuZGxlci5weQ==) | `0% <0%> (ø)` | |
   | [airflow/models/connection.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==) | `68.78% <0%> (ø)` | |
   | [airflow/contrib/operators/vertica\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy92ZXJ0aWNhX3RvX2hpdmUucHk=) | `0% <0%> (ø)` | |
   | [airflow/gcp/operators/bigquery.py](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree#diff-YWlyZmxvdy9nY3Avb3BlcmF0b3JzL2JpZ3F1ZXJ5LnB5) | `91.59% <100%> (ø)` | |
   | ... and [98 more](https://codecov.io/gh/apache/airflow/pull/7157/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=footer). Last update [92521aa...bf5d94d](https://codecov.io/gh/apache/airflow/pull/7157?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-576943577
 
 
   I have a probably unwelcome comment. But do we really need to have this config? I feel like this is something that should be checked during CI/CD before it deploys to Airflow. It gives an extra overhead (though trivial) when parsing the file. Also, it makes a bit harder to read the code (which is already a bit hard to read)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] stale[bot] commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
stale[bot] commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-601150882
 
 
   This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366661407
 
 

 ##########
 File path: tests/models/test_dagbag.py
 ##########
 @@ -151,6 +151,42 @@ def test_zip(self):
         dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip"))
         self.assertTrue(dagbag.get_dag("test_zip_dag"))
 
+    @conf_vars({('core', 'max_tasks_per_dag'): '5'})
+    def test_process_file_max_task_check(self):
+        """
+        test if num_tasks > max_tasks_per_dag can be identified
+        """
+        a_dag_id = "example_short_circuit_operator"
 
 Review comment:
   ```suggestion
           a_dag_id = "test_example_bash_operator"
   ```
   
   I take a quick look. `TEST_DAGS_FOLDER` point to `airflow/tests/models/../dags` and this path not including dag `example_short_circuit_operator`. So you could not process_file `airflow/tests/models/../dags/example_short_circuit_operator.py`
   
   Maybe you should use others dag in `airflow/tests/models/../dags`(which `TEST_DAGS_FOLDER` point to), just like `test_example_bash_operator` and change `@conf_vars({('core', 'max_tasks_per_dag'): '5'})` to `@conf_vars({('core', 'max_tasks_per_dag'): '7'})`.
   
   So as others test @tooptoop4 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366241733
 
 

 ##########
 File path: airflow/models/dagbag.py
 ##########
 @@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
                         dag.full_filepath = filepath
                         if dag.fileloc != filepath and not is_zipfile:
                             dag.fileloc = filepath
-                    try:
-                        dag.is_subdag = False
-                        self.bag_dag(dag, parent_dag=dag, root_dag=dag)
-                        if isinstance(dag._schedule_interval, str):
-                            croniter(dag._schedule_interval)
-                        found_dags.append(dag)
-                        found_dags += dag.subdags
-                    except (CroniterBadCronError,
-                            CroniterBadDateError,
-                            CroniterNotAlphaError) as cron_e:
-                        self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+                    num_tasks = len(dag.tasks)
+                    if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
 
 Review comment:
   ```suggestion
                       if 0 < max_tasks_per_dag < num_tasks:
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-601195402
 
 
   be gone stalebot

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366360841
 
 

 ##########
 File path: airflow/models/dagbag.py
 ##########
 @@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
                         dag.full_filepath = filepath
                         if dag.fileloc != filepath and not is_zipfile:
                             dag.fileloc = filepath
-                    try:
-                        dag.is_subdag = False
-                        self.bag_dag(dag, parent_dag=dag, root_dag=dag)
-                        if isinstance(dag._schedule_interval, str):
-                            croniter(dag._schedule_interval)
-                        found_dags.append(dag)
-                        found_dags += dag.subdags
-                    except (CroniterBadCronError,
-                            CroniterBadDateError,
-                            CroniterNotAlphaError) as cron_e:
-                        self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+                    num_tasks = len(dag.tasks)
+                    if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
 
 Review comment:
   Really? but I remember we have a PR to do that, I will take a look

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366363132
 
 

 ##########
 File path: airflow/models/dagbag.py
 ##########
 @@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
                         dag.full_filepath = filepath
                         if dag.fileloc != filepath and not is_zipfile:
                             dag.fileloc = filepath
-                    try:
-                        dag.is_subdag = False
-                        self.bag_dag(dag, parent_dag=dag, root_dag=dag)
-                        if isinstance(dag._schedule_interval, str):
-                            croniter(dag._schedule_interval)
-                        found_dags.append(dag)
-                        found_dags += dag.subdags
-                    except (CroniterBadCronError,
-                            CroniterBadDateError,
-                            CroniterNotAlphaError) as cron_e:
-                        self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+                    num_tasks = len(dag.tasks)
+                    if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
 
 Review comment:
   @ashb  After some quick look, in
   https://github.com/apache/airflow/blob/4a344f13d26ecbb627bb9968895b290bfd86e4da/airflow/cli/commands/webserver_command.py#L90
   And it work in 
   ```py
   In [1]: x = 0
   
   In [2]: num_task = 10
   
   In [3]: 0 < x < num_task
   Out[3]: False
   
   In [4]: 0 <= x < num_task
   Out[4]: True
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] tooptoop4 commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366445111
 
 

 ##########
 File path: airflow/models/dagbag.py
 ##########
 @@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
                         dag.full_filepath = filepath
                         if dag.fileloc != filepath and not is_zipfile:
                             dag.fileloc = filepath
-                    try:
-                        dag.is_subdag = False
-                        self.bag_dag(dag, parent_dag=dag, root_dag=dag)
-                        if isinstance(dag._schedule_interval, str):
-                            croniter(dag._schedule_interval)
-                        found_dags.append(dag)
-                        found_dags += dag.subdags
-                    except (CroniterBadCronError,
-                            CroniterBadDateError,
-                            CroniterNotAlphaError) as cron_e:
-                        self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+                    num_tasks = len(dag.tasks)
+                    if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
 
 Review comment:
   confusion and need to test it proves the shorthand syntax is not clear, readable code

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-577052428
 
 
   You can check the task count in the CI/CD process after deploying to source control similar to other DAG validation tests that you may have ([relevant blog post](https://blog.usejournal.com/testing-in-airflow-part-1-dag-validation-tests-dag-definition-tests-and-unit-tests-2aa94970570c) & [gist](https://gist.github.com/criccomini/2862667822af7fae8b55682faef029a7). A simple example:
   
   ```python
   import unittest
   
   from airflow.models import DagBag
   
   class TestTaskCountPerDag(unittest.TestCase):
       """
       Test the if the number of tasks per dag exceed the threshold
       """
   
       MAX_TASKS_PER_DAG = 3
   
       def _get_dagbag(self):
           dag_folder = os.getenv('AIRFLOW_DAGS', False)
           self.assertTrue(
               dag_folder,
               'AIRFLOW_DAGS must be set to a folder that has DAGs in it.')
           return DagBag(dag_folder=dag_folder, include_examples=False)
   
       def test_max_tasks_per_dag(self):
           """
           Verify that tasks per dag do not exceed the threshold
           """
           dagbag = self._get_dagbag()
           dags = dagbag.dags
   
           for dag_name, dag in dags.items():
               self.assertLessEqual(dag.task_count, 
                                    MAX_TASKS_PER_DAG,
                                    msg=('Numer of tasks for DAG {} is greater than ' + 
                                        'the threshold {}').format(dag_name, MAX_TASKS_PER_DAG)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] tooptoop4 removed a comment on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
tooptoop4 removed a comment on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574418492
 
 
   @zhongjiajie any idea on why test not working?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-578975631
 
 
   @kaxil WDYT

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574216275
 
 
   > > If want to upload huge dag, should we set default value to not unlimit? BTW, should we add some doc to describe it?
   > 
   > existing behaviour is unlimited, i want this to not break existing users, so it is 'opt-in'
   
   I prefer to add use fix value for it and change the `UPDATING.md`
   If we decide to use unlimited, at least we should add some document to remind users.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366660224
 
 

 ##########
 File path: airflow/models/dagbag.py
 ##########
 @@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
                         dag.full_filepath = filepath
                         if dag.fileloc != filepath and not is_zipfile:
                             dag.fileloc = filepath
-                    try:
-                        dag.is_subdag = False
-                        self.bag_dag(dag, parent_dag=dag, root_dag=dag)
-                        if isinstance(dag._schedule_interval, str):
-                            croniter(dag._schedule_interval)
-                        found_dags.append(dag)
-                        found_dags += dag.subdags
-                    except (CroniterBadCronError,
-                            CroniterBadDateError,
-                            CroniterNotAlphaError) as cron_e:
-                        self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+                    num_tasks = len(dag.tasks)
+                    if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
 
 Review comment:
   But code base already have some example and IDE like pycharm will give a hint change to `var1 < var2 < var3`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-577039208
 
 
   @xinbinhuang a dynamic dag may generate tasks at runtime based off a list of files in a 3rd party server so no way for cicd to trap it

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366363132
 
 

 ##########
 File path: airflow/models/dagbag.py
 ##########
 @@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
                         dag.full_filepath = filepath
                         if dag.fileloc != filepath and not is_zipfile:
                             dag.fileloc = filepath
-                    try:
-                        dag.is_subdag = False
-                        self.bag_dag(dag, parent_dag=dag, root_dag=dag)
-                        if isinstance(dag._schedule_interval, str):
-                            croniter(dag._schedule_interval)
-                        found_dags.append(dag)
-                        found_dags += dag.subdags
-                    except (CroniterBadCronError,
-                            CroniterBadDateError,
-                            CroniterNotAlphaError) as cron_e:
-                        self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+                    num_tasks = len(dag.tasks)
+                    if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
 
 Review comment:
   After some quick look, in
   https://github.com/apache/airflow/blob/4a344f13d26ecbb627bb9968895b290bfd86e4da/airflow/cli/commands/webserver_command.py#L90
   And it work in 
   ```py
   In [1]: x = 0
   
   In [2]: num_task = 10
   
   In [3]: 0 < x < num_task
   Out[3]: False
   
   In [4]: 0 <= x < num_task
   Out[4]: True
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574418492
 
 
   @zhongjiajie any idea on why test not working?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574213590
 
 
   > If want to upload huge dag, should we set default value to not unlimit? BTW, should we add some doc to describe it?
   
   existing behaviour is unlimited, i want this to not break existing users, so it is 'opt-in'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-575794455
 
 
   travis happy

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] tooptoop4 edited a comment on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-577039208
 
 
   @xinbinhuang a dynamic dag may generate tasks at runtime based off a list of files in a 3rd party server (ie an sftp drive to be downloaded from and its csvs ingested) so no way for cicd to trap it

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#discussion_r366303058
 
 

 ##########
 File path: airflow/models/dagbag.py
 ##########
 @@ -300,26 +302,37 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
                         dag.full_filepath = filepath
                         if dag.fileloc != filepath and not is_zipfile:
                             dag.fileloc = filepath
-                    try:
-                        dag.is_subdag = False
-                        self.bag_dag(dag, parent_dag=dag, root_dag=dag)
-                        if isinstance(dag._schedule_interval, str):
-                            croniter(dag._schedule_interval)
-                        found_dags.append(dag)
-                        found_dags += dag.subdags
-                    except (CroniterBadCronError,
-                            CroniterBadDateError,
-                            CroniterNotAlphaError) as cron_e:
-                        self.log.exception("Failed to bag_dag: %s", dag.full_filepath)
+
+                    num_tasks = len(dag.tasks)
+                    if max_tasks_per_dag > 0 and num_tasks > max_tasks_per_dag:
 
 Review comment:
   That doens't wotk @zhongjiajie 
   
   ```
   In [1]: x = 0                                                                                                                                                  
   
   In [2]: num_task = 10                                                                                                                                          
   
   In [3]: 0 < x < num_task                                                                                                                                       
   Out[3]: False
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #7157: [AIRFLOW-6251] add config for max tasks per dag
URL: https://github.com/apache/airflow/pull/7157#issuecomment-574548195
 
 
   FYI, Travis-CI still sad, this time is timeout, may maybe you should test int you local breeze using `pytest -vs tests/cli/commands/test_task_command.py` for more detail

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services