You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/06/02 13:24:15 UTC

[GitHub] [airflow] ashb opened a new pull request #9105: Cope with multiple processes get_remote_image_info in parallel

ashb opened a new pull request #9105:
URL: https://github.com/apache/airflow/pull/9105


   When I'd made a change to a large number of python files, running the
   flake8 pre-commit hook would fail without obvious error (as in no error
   was printed, but exit code was 1).
   
   In debugging this I switch the pre-commit to `require_serial: true` and
   the problem went way - the fix for this is:
   
   - Don't redirect stderr to /dev/null (that silences both our VERBOSE
     trace output, and the errors from docker)
   - Use `--cidfile` option to docker to create a random name and write the
     created container ID to a file
   
   I'm not sure why no one else has hit this before.
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Target Github ISSUE in description if exists
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-637635650


   I'd recommend to rebuild it then. `pre-commit run build` should do the job. Alternatively `./breeze static-check-all-files flake8` will run build as needed as well. The `breeze static-check-all-files` solution is better in the way that it also provides autocomplete for the pre-commit checks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-637638089


   `./breeze static-check-all-files flake8 --force-pull-images` will force-pull the image before you run it to save the time for local rebuilding. Takes 2 minutes to pull + build for me (though it depends on the internet speed of course) 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-637545546


   (I'm not sure how _I_ haven't hit this before either.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-637797720


   > BTW. What was the error? I would love to understand what happened there.
   
   The error was (at first) not printed cos we were redirecting stderr to dev null.
   
   Once I'd worked that out the error was "name already in use" -hence the cidfile change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-637592414


   Yeah, I've not run in to this problem before either, I'm not sure what I changed in my env to trigger it, but now that I have it is _always_ happening.
   
   Possibly the fact that my image is _way_ out of date. `pre-commit run flake8 --all-files` was always failing for me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-638802871


   Sure. I expected that. My refactor is really about moving things around without changing it, so it will be easier for me to resolve conflicts.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-638164755


   > Curious, I wonder why that precommit check didn't run for me locally.
   > 
   > Oh cos of the files regex -- which skips it unless one of the breeze files is changed.
   
   Yep. We run pre-commit with `--all-files` in PR just in case there are some hidden dependencies.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-638183253


   > Generation of breeze docs fails with
   > 
   > ```
   > - hook id: update-breeze-file
   > - exit code: 1
   > 
   > /home/runner/work/airflow/airflow/scripts/ci/_utils.sh: line 902: FILES_TO_CLEANUP_ON_EXIT: unbound variable
   > ```
   > 
   > So I think the problem is still there.
   
   This is an odd one, and isn't actually to do with the array (testing locally with a scalar variable and it still has the same problem).
   
   Somehow, when the file is sourced within a function, the variables defined in the sourced files are no longer defined once the function returns. I guess bash is treating those as local variables in that case?! (We are importing _utils.sh from within `setup_default_breeze_variables`, moving the import outside the problem)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-637798777


   > `./breeze static-check-all-files flake8 --force-pull-images` will force-pull the image before you run it to save the time for local rebuilding. Takes 2 minutes to pull + build for me (though it depends on the internet speed of course) 
   > 
   
   It's always about 5-10 mins for me, at least. (It used to be 20mins to rebuild! But that was before some caching was added) so I've gotten in the habit of not rebuilding very frequently. It has _so far_ not been a problem for me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434499797



##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       I'll keep with the convention then.
   
   
   The reason I've made it an array is for the next tidyup commit around the cache directory (We're blanket deleting the cache folder, ignoring the 'SKIP_CACHE_DELETION' flag and calling `remove_cache_directory` unconditionally in script_end).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434501581



##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       https://github.com/apache/airflow/pull/9105/commits/9d14f96ff12fd5834c859d7126bc11af92ffb2ac to update to current style




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb merged pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb merged pull request #9105:
URL: https://github.com/apache/airflow/pull/9105


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-638190568


   K, I think I've worked out what is going on.
   
   `declare` inside a function appears to make the variables local (but I couldn't find this in any document). And since _utils.sh was being sourced from inside setup_default_breeze_variables, that made FILES_TO_CLEANUP_ON_EXIT local -> undefined when script_end came to run
   
   Fixed via https://github.com/apache/airflow/pull/9105/commits/7d3a15dc5c2616d2a04674196b575963077acd73 (which is probably a better fix anyway)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434496251



##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       ```
   ❯ bash -c 'set -uo pipefail -e; declare -a foo=(); echo ${#foo[@]}'
   0
   ❯ bash --version
   GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin19)
   Copyright (C) 2007 Free Software Foundation, Inc.
   ```
   
   ```
   ❯ bash --version
   GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
   Copyright (C) 2019 Free Software Foundation, Inc.
   License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
   
   This is free software; you are free to change and redistribute it.
   There is NO WARRANTY, to the extent permitted by law.
   
   ❯ bash -c 'set -uo pipefail -e; declare -a foo=(); echo ${#foo[@]}' 
   0
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-638158764


   Curious, I wonder why that precommit check didn't run for me locally.
   
   Oh cos of the files regex -- which skips it unless one of the breeze files is changed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434506267



##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       I am ok with adopting a style guide for that if we are to adopt it consistently (including some dev scripts).  Maybe that's the right time to agree on it. 
   
   I think so far there was no convention whatsoever in many places. I am not tied to any convention and happy to adapt as long as we consciously agree do it and stick to it everywhere  :).
   
   I love to change my habits from time to time. Shall we ask people? I think also that might be a good time to discuss pros/cons of using bash or maybe adopt another approach in the future.  I can think of two approaches - python as you suggested or go as it produces statically linked multiple-platform binaries that you can downioad/install separately without dependencies and is already used in docker and many other tools like that). 
   
   So far we discussed it in slack. But maybe let's finish some of the important things first like images and chart - not too open too many discussions and work in parallel. We all already have too much on our plates.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-637639745


   BTW. What was the error? I would love to understand what happened there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r433903219



##########
File path: scripts/ci/_utils.sh
##########
@@ -712,25 +717,29 @@ function get_local_image_info() {
 function get_remote_image_info() {
     set +e
     # Pull remote manifest image
-    if ! verbose_docker pull "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}" >/dev/null 2>&1 ; then
+    if ! verbose_docker pull "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}" >/dev/null; then
         echo
         echo "Remote docker registry unreachable"
         echo
         REMOTE_DOCKER_REGISTRY_UNREACHABLE="true"
         return
     fi
+    set -e
+
+    # Docker needs the file passed to --cidfile to not exist, so we can't use mktemp
+    TMP_CONTAINER_ID="remote-airflow-manifest-$$.container_id"
+    trap 'rm -f "$TMP_CONTAINER_ID"' EXIT
+
     TMP_MANIFEST_REMOTE_JSON=$(mktemp)
     TMP_MANIFEST_REMOTE_SHA=$(mktemp)
-    # delete container just in case
-    verbose_docker rm --force "remote-airflow-manifest" >/dev/null 2>&1
-    set -e
+
     # Create container out of the manifest image without runnning it
-    verbose_docker create --name "remote-airflow-manifest" "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}" >/dev/null 2>&1
+    verbose_docker create --cidfile "${TMP_CONTAINER_ID}" "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}"

Review comment:
       Nice. Indeed it works better in parallel this way.

##########
File path: scripts/ci/_utils.sh
##########
@@ -39,10 +39,15 @@ function check_verbose_setup {
 # In case "VERBOSE" is set to "true" (--verbose flag in Breeze) all docker commands run will be
 # printed before execution
 function verbose_docker {
-    if [[ ${VERBOSE:="false"} == "true" ]]; then
+    if [[ ${VERBOSE_COMMANDS:=} == "true" ]]; then

Review comment:
       ```suggestion
       if [[ ${VERBOSE} == "true" && ${VERBOSE_COMMAND} != "true" ]]; then
            # do not print echo if VERSBOSE_COMMAND is set (set -x does it already)
            echo "docker" "${@}"
        fi
        docker "${@}"
   ```

##########
File path: scripts/ci/_utils.sh
##########
@@ -712,25 +717,29 @@ function get_local_image_info() {
 function get_remote_image_info() {
     set +e
     # Pull remote manifest image
-    if ! verbose_docker pull "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}" >/dev/null 2>&1 ; then
+    if ! verbose_docker pull "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}" >/dev/null; then
         echo
         echo "Remote docker registry unreachable"
         echo
         REMOTE_DOCKER_REGISTRY_UNREACHABLE="true"
         return
     fi
+    set -e
+
+    # Docker needs the file passed to --cidfile to not exist, so we can't use mktemp
+    TMP_CONTAINER_ID="remote-airflow-manifest-$$.container_id"
+    trap 'rm -f "$TMP_CONTAINER_ID"' EXIT

Review comment:
       this will override previous trap if you had it set (which is likely), The best solution I found is to wrap the trap code into a function and do it as follows:
   
   ```bash
   # adding trap to exiting trap
   HANDLERS="$( trap -p EXIT | cut -f2 -d \' )"
   # shellcheck disable=SC2064
   trap "${HANDLERS}${HANDLERS:+;}the_new_function" EXIT
   ``




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434500882



##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       FWIW https://google.github.io/styleguide/shellguide.html#function-names disagrees with this style:
   
   > Function Names
   >
   > Lower-case, with underscores to separate words. 
   >
   > ...
   >
   > Variable Names
   >
   > As for function names.
   > 
   >
   > Constants and Environment Variable Names
   > 
   > All caps, separated with underscores, declared at the top of the file.
   >
   > Constants and anything exported to the environment should be capitalized.
   
   This variable is neither a constant, nor is it exported.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434463564



##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       I think it will fail on Mac anyway :(. I had problems with that - empty array is treated as an unbound variable on macOS default Bash (they keep on 3.2.57 as it is GPL2 and anything above is GPLv3 which is not acceptable for Apple). Would be worth checking on Mac.
   
   While -u is useful, I found that sometimes I had to disable it when using arrays that might be empty.

##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       Or avoid arrays in cases like that. I think I'd stick to just FILE_TO_CLEAN potentially empty rather then files_to clean in plural.

##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       This should be UPPERCASE. The convention which I stick to everywhere else (And it's good to keep it consistent) is to keep variable names as UPPERCASE and function names lowercase. This is traditional approach and makes things much more obvious.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434500882



##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       FWIW https://google.github.io/styleguide/shellguide.html#function-names disagrees with this style guide:
   
   > Function Names
   >
   > Lower-case, with underscores to separate words. 
   >
   > ...
   >
   > Variable Names
   >
   > As for function names.
   > 
   >
   > Constants and Environment Variable Names
   > 
   > All caps, separated with underscores, declared at the top of the file.
   >
   > Constants and anything exported to the environment should be capitalized.
   
   This variable is neither a constant, nor is it exported.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r433934787



##########
File path: scripts/ci/_utils.sh
##########
@@ -39,10 +39,15 @@ function check_verbose_setup {
 # In case "VERBOSE" is set to "true" (--verbose flag in Breeze) all docker commands run will be
 # printed before execution
 function verbose_docker {
-    if [[ ${VERBOSE:="false"} == "true" ]]; then
+    if [[ ${VERBOSE_COMMANDS:=} == "true" ]]; then

Review comment:
       Oh yes, can be simplified now. I originally had some branches doing `> /dev/null` but have since removed that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434442502



##########
File path: scripts/ci/_utils.sh
##########
@@ -39,10 +39,15 @@ function check_verbose_setup {
 # In case "VERBOSE" is set to "true" (--verbose flag in Breeze) all docker commands run will be
 # printed before execution
 function verbose_docker {
-    if [[ ${VERBOSE:="false"} == "true" ]]; then
+    if [[ ${VERBOSE_COMMANDS:=} == "true" ]]; then

Review comment:
       Fixed by https://github.com/apache/airflow/pull/9105/commits/ee5cb4c59d262bd2817d1bda56a3b2c0c31d6afe




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-638190568


   K, I think I've worked out what is going on.
   
   `declare` inside a function appears to make the variables local (but I couldn't find this in any document). And since _utils.sh was being sourced, that made FILES_TO_CLEANUP_ON_EXIT local -> undefined when script_end came to run
   
   Fixed via https://github.com/apache/airflow/pull/9105/commits/7d3a15dc5c2616d2a04674196b575963077acd73 (which is probably a better fix anyway)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434529969



##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       Yeah, if we're changing 
   I think for breeze/CI shell scripts it's mostly you and I who write the shell scripts, and so long as we document 
   
   > > Curious, I wonder why that precommit check didn't run for me locally.
   > > Oh cos of the files regex -- which skips it unless one of the breeze files is changed.
   > 
   > Yep. We run pre-commit with `--all-files` in PR just in case there are some hidden dependencies.
   
   Smart man. Cos there totally were here :D




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434442313



##########
File path: scripts/ci/_utils.sh
##########
@@ -712,25 +717,29 @@ function get_local_image_info() {
 function get_remote_image_info() {
     set +e
     # Pull remote manifest image
-    if ! verbose_docker pull "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}" >/dev/null 2>&1 ; then
+    if ! verbose_docker pull "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}" >/dev/null; then
         echo
         echo "Remote docker registry unreachable"
         echo
         REMOTE_DOCKER_REGISTRY_UNREACHABLE="true"
         return
     fi
+    set -e
+
+    # Docker needs the file passed to --cidfile to not exist, so we can't use mktemp
+    TMP_CONTAINER_ID="remote-airflow-manifest-$$.container_id"
+    trap 'rm -f "$TMP_CONTAINER_ID"' EXIT

Review comment:
       Fixed by https://github.com/apache/airflow/pull/9105/commits/bce2f6519589355d590af7837000fdf10d11d11b




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434442140



##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       This looks odd, but without it it can fail with "unbound variable" on modern versions of bash when nothing is added to the array.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-638232660


   @potiuk All tests green now. PTAL when you have a moment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-637638089


   `./breeze static-check-all-files flake8 --force-pull-images`` will force-pull the image before you run it to save the time for local rebuilding. Takes 2 minutes to pull + build for me (though it depends on the internet speed of course) 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r433934391



##########
File path: scripts/ci/_utils.sh
##########
@@ -712,25 +717,29 @@ function get_local_image_info() {
 function get_remote_image_info() {
     set +e
     # Pull remote manifest image
-    if ! verbose_docker pull "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}" >/dev/null 2>&1 ; then
+    if ! verbose_docker pull "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}" >/dev/null; then
         echo
         echo "Remote docker registry unreachable"
         echo
         REMOTE_DOCKER_REGISTRY_UNREACHABLE="true"
         return
     fi
+    set -e
+
+    # Docker needs the file passed to --cidfile to not exist, so we can't use mktemp
+    TMP_CONTAINER_ID="remote-airflow-manifest-$$.container_id"
+    trap 'rm -f "$TMP_CONTAINER_ID"' EXIT

Review comment:
       Oh, you can only have a single trap? How pesky. I'll come up with something then.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#discussion_r434531556



##########
File path: scripts/ci/_utils.sh
##########
@@ -26,6 +26,8 @@ declare -a EXTRA_DOCKER_PROD_BUILD_FLAGS
 export EXTRA_DOCKER_FLAGS
 export EXTRA_DOCKER_PROD_BUILD_FLAGS
 
+declare -a files_to_cleanup_on_exit=()

Review comment:
       Yeah, I'm a fan of making change of style the only thing in a PR.
   
   Honestly, I think you and I are largely the only ones who work in any depth in the scripts/breeze (and mostly you right now, but I'm trying to change that a bit), so I'd probably say we decide on and document a style, then tag a the usual suspects on a PR would be enough -- (unlike the python side which affects many more people).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-637592414


   Yeah, I've not run in to this problem before either, I'm not sure what I changed in my env to trigger it, but now that I have it is _always_ happening.
   
   Possibly the fact that my image is _way_ out of date. `pre-commit run flake8 --all-files` was always failing for me.
   
   ```
   apache/airflow                                          master-python3.6-ci-manifest       c8637a24a07e        3 days ago          16.1kB
   apache/airflow                                          master-python3.6-ci                1ff720d21f67        4 days ago          4.44GB
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-638154294


   Generation of breeze docs fails with
   
   ```
   - hook id: update-breeze-file
   - exit code: 1
   
   /home/runner/work/airflow/airflow/scripts/ci/_utils.sh: line 902: FILES_TO_CLEANUP_ON_EXIT: unbound variable
   ```
   
   So I think the problem is still there.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-638166142


   > > Curious, I wonder why that precommit check didn't run for me locally.
   > > Oh cos of the files regex -- which skips it unless one of the breeze files is changed.
   > 
   > Yep. We run pre-commit with `--all-files` in PR just in case there are some hidden dependencies.
   
   Smart man -- cos there were here :D


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-638158764


   Curious, I wonder why that precommit check didn't run for me locally.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9105: Cope with multiple processes get_remote_image_info in parallel

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9105:
URL: https://github.com/apache/airflow/pull/9105#issuecomment-638753192


   I'm going to merge this one now since your refactor already has a conflict.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org