You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/29 21:55:35 UTC

[GitHub] [airflow] ashb opened a new pull request #11957: Cache CLI parser objects

ashb opened a new pull request #11957:
URL: https://github.com/apache/airflow/pull/11957


   Right now, when we are using the CLI parser in LocalTaskJob and executor, having to re-create the parser each time adds a noticeable overhead for quick tasks -- by caching this we save 0.07s
   
   (I know that doesn't sound like much, but for an `echo true` BashOperator task that takes it from 0.25s to 0.19s - i.e. it cuts the time down by about 20%!)
   
   When coupled with #11956 this makes an ungodly difference to task run time for large dags with many short DAGs, more so than it looks like individually -- we've seen total "task lag" for a DAG (time between end date of one task and start date of next) go from 21 seconds down to 0.7 seconds - varies quite a lot based on number of tasks and DAG structure (linear vs binary tree etc.) -- but it always helps.
   
   Using CLI parser inside LocalTaskJob and the Executor is a bit of a hack, but this is a _very_ easy fix for the speed it gives!
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#discussion_r514604612



##########
File path: airflow/cli/cli_parser.py
##########
@@ -1457,6 +1458,7 @@ def _format_action(self, action: Action):
         return super()._format_action(action)
 
 
+@lru_cache(maxsize=2)

Review comment:
       Very good to know -- that wasn't really made clear from the docs. Doesn't make much difference in this case as it's called in-frequently, but useful.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#issuecomment-719078203


   > > Has the CI run full test-suite? I doubt, looks like only 1 test was run for Postgres -- Postgres 9.6 Py 3.6
   > 
   > It hasn't no -- do you want it to? If so apply the "full tests needed" label and then re-run the tests.
   > 
   > I think _this_ change is fine to not run full tests though. But I do wonder why it didn't
   
   No just curious, don't think the results might be any different, so feel free to merge. Just curious, why it didn't


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mjpieters commented on a change in pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
mjpieters commented on a change in pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#discussion_r514602542



##########
File path: airflow/cli/cli_parser.py
##########
@@ -1457,6 +1458,7 @@ def _format_action(self, action: Action):
         return super()._format_action(action)
 
 
+@lru_cache(maxsize=2)

Review comment:
       For reference, compare the wrapper functions that are returned for the maxsize=None case:
   
   https://github.com/python/cpython/blob/60324d26b58c89d68abb23fb42f1563d395c3910/Lib/functools.py#L548-L561
   
   with the maxsize=2 case:
   
   https://github.com/python/cpython/blob/60324d26b58c89d68abb23fb42f1563d395c3910/Lib/functools.py#L563-L620




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#issuecomment-719055116


   Static checks failing should be fixed by https://github.com/apache/airflow/pull/11944


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb merged pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
ashb merged pull request #11957:
URL: https://github.com/apache/airflow/pull/11957


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mjpieters commented on a change in pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
mjpieters commented on a change in pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#discussion_r514599968



##########
File path: airflow/cli/cli_parser.py
##########
@@ -1457,6 +1458,7 @@ def _format_action(self, action: Action):
         return super()._format_action(action)
 
 
+@lru_cache(maxsize=2)

Review comment:
       There is no point in giving this a max size here. There are two possible cache keys: True or False, so the cache is already bounded. But by setting maxsize to 2 the LRU cache decorator will lock and test the cache size every call, and that's just unnecessary.
   
   If you use `maxsize=None` no locking is needed and cache lookups are that little bit faster.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#discussion_r514611440



##########
File path: airflow/cli/cli_parser.py
##########
@@ -1457,6 +1458,7 @@ def _format_action(self, action: Action):
         return super()._format_action(action)
 
 
+@lru_cache(maxsize=2)

Review comment:
       Didn't know about functools.cache in Py 3.9, nice to know




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#issuecomment-719048325


   The PR should be OK to be merged with just subset of tests as it does not modify Core of Airflow. The committers might merge it or can add a label 'full tests needed' and re-run it to run all tests if they see it is needed!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#discussion_r514601059



##########
File path: airflow/cli/cli_parser.py
##########
@@ -1457,6 +1458,7 @@ def _format_action(self, action: Action):
         return super()._format_action(action)
 
 
+@lru_cache(maxsize=2)

Review comment:
       Oh thank you, I didn't realize that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#issuecomment-719047430


   /cc @mjpieters -- you may be interested in applying and testing with this and #11956.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#issuecomment-719072219


   Has the CI run full test-suite? I doubt, looks like only 1 test was run for Postgres -- Postgres 9.6 Py 3.6


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#issuecomment-719077463


   > Has the CI run full test-suite? I doubt, looks like only 1 test was run for Postgres -- Postgres 9.6 Py 3.6
   
   It hasn't no -- do you want it to? If so apply the "full tests needed" label and then re-run the tests.
   
   I think _this_ change is fine to not run full tests though. It didn't run full tests as there are a set of sub-tests specific to CLI, and `Adding CLI to selected files as 1 CLI files changed`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#issuecomment-719077463


   > Has the CI run full test-suite? I doubt, looks like only 1 test was run for Postgres -- Postgres 9.6 Py 3.6
   
   It hasn't no -- do you want it to? If so apply the "full tests needed" label and then re-run the tests.
   
   I think _this_ change is fine to not run full tests though. But I do wonder why it didn't


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#discussion_r514611440



##########
File path: airflow/cli/cli_parser.py
##########
@@ -1457,6 +1458,7 @@ def _format_action(self, action: Action):
         return super()._format_action(action)
 
 
+@lru_cache(maxsize=2)

Review comment:
       Didn't know about functools.cache, nice to know




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#issuecomment-719078106


   @kaxil Do you want full tests run here or are you happy with just this set?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#discussion_r514601204



##########
File path: airflow/cli/cli_parser.py
##########
@@ -1457,6 +1458,7 @@ def _format_action(self, action: Action):
         return super()._format_action(action)
 
 
+@lru_cache(maxsize=2)

Review comment:
       ```suggestion
   @lru_cache(maxsize=None)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mjpieters commented on a change in pull request #11957: Cache CLI parser objects

Posted by GitBox <gi...@apache.org>.
mjpieters commented on a change in pull request #11957:
URL: https://github.com/apache/airflow/pull/11957#discussion_r514607341



##########
File path: airflow/cli/cli_parser.py
##########
@@ -1457,6 +1458,7 @@ def _format_action(self, action: Action):
         return super()._format_action(action)
 
 
+@lru_cache(maxsize=2)

Review comment:
       Yeah, that's one of the reasons why 3.9 adds [`functools.cache()`](https://docs.python.org/3/library/functools.html#functools.cache), which is really nothing more than [a proxy for `lru_cache(maxsize=None)`](https://github.com/python/cpython/blob/60324d26b58c89d68abb23fb42f1563d395c3910/Lib/functools.py#L650-L652).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org