You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/08 05:16:09 UTC

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

jorgecarleitao commented on a change in pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#discussion_r538041597



##########
File path: .github/workflows/cpp.yml
##########
@@ -21,7 +21,7 @@ on:
   push:
     paths:
       - '.github/workflows/cpp.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/*cpp*'

Review comment:
       Sorry it took me so long to go through this comment. I actually think that this PR is correct, but it surfaces a bigger problem present in master.
   
   Let's say that we have an image X (typically build via `archery build X`), that depends on a `base` image (typically build via `archery build base`).
   
   Let's analyze what happens in the following 4 cases:
   
   * push to PR with no change to base image's dockerfile
   * push to master with no change to base image's dockerfile
   * push to PR with a change to base image's dockerfile
   * push to master with a change to base image's dockerfile
   
   Let's also call the hash of the current base image in the registry as `base_v1`.
   
   ## push to PR with no change to base image's dockerfile
   
   When a PR is open, image X is built using the dockerfile on the PR for that image, and it uses `base_v1` from the registry to base its build from.
   
   ## push to master with no change to base image's dockerfile
   
   Same as before.
   
   ## push to PR with a change to base image's dockerfile
   
   In this situation, the PR triggers two CI jobs: the build of image `X` and the build image `base`. Since these two jobs are independent workflows, image `X` will pick `base_v1`, while the base image job will build `base_v2`.
   
   This happens because `docker-compose run X` does not know that `X` depends on the compose job `base` to create the image `base`, and instead has that image available from the remote docker registry `${REPO}`.
   
   Because PRs do not push images to the registry, a new push to the PR will not alter this behavior: the image `X` will continue to be built based on `base_v1`. Thus, any error derived from building image `X` from the new image `base_v2` will not surface.
   
   ## push to master with a change to base image's dockerfile (e.g. PR is merged)
   
   In this situation, there is a race condition (undefined behavior): image X will be built using `base_v1` if it starts running before the flow `build image base` finishes (the likely scenario), or it picks `base_v2` if the other flow has finished.
   
   Thus, atm, whenever a PR changes both dockerfiles, `X` and `base`, the image X on the registry is based of `base_v1`, but `base:latest` points to `base_v2`.
   
   Thus, the pipeline will actually pass in master, even if the build of `X` using `base_v2` is incorrect(!). On the _next_ push to master that triggers `X` to be built, the error surfaces 🤯
   
   ## Summary
   
   In both cases where the dockerfile of `base` changes, that change is not propagated to the build of `X`. Therefore, IMO this PR is correct: changes to the `dockerfile` of image `base` do not require to trigger a change to the build of the image `X`, because there are no constraints in the jobs to build `X` to wait for the build of image `base`.
   
   This issue seems to indicate that our builds based of docker images that depend on other docker images that are build using `docker-compose build` may be too complex as there is hidden state through the registry.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org