You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/03 17:18:02 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #8824: ARROW-10798: [CI] Made workflows be less triggered.

jorgecarleitao opened a new pull request #8824:
URL: https://github.com/apache/arrow/pull/8824


   This PR restricts some CI workflows from getting triggered when any dockerfile changes, thereby reducing the workload on our queue when someone modifies a dockerfile.
   
   This is particularly important when testing the CI itself, as devs will trigger multiple runs of the CI for single change of a single dockerfile, which triggers the execution of a large number of workflows.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#discussion_r538745060



##########
File path: .github/workflows/cpp.yml
##########
@@ -21,7 +21,7 @@ on:
   push:
     paths:
       - '.github/workflows/cpp.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/*cpp*'

Review comment:
       I don't think that's correct but maybe @kszucs can weigh in definitively. My understanding of how docker works (which may be wrong) is that if image `Y` is based dockerfile `y`, and `y` is changed, then `Y` gets rebuilt locally in `docker-compose build Y`. If `y` is based on image `X` (and therefore dockerfile `x`), and `x` is changed, building `Y` will first build `X` and then proceed using `y` to build `Y`.  
   
   I see a few options: 
   
   1. Empirically determine which is correct
   2. Defensively, make the change I suggested (add cpp dockerfile triggers to ruby, python, and r workflows) and we're safe either way
   3. Make no additional change, as you propose, and risk merging changes to C++ dockerfiles that break downstream jobs that we'll only see after merging. 
   
   The risks entailed by #3 are probably low--these files change infrequently, and unless we're doing a wholesale refactor of docker images, they're probably low risk changes. At the same time, the cost of doing #2 is also very low and prevents the surprise failures that #3 could later cause. Given the low costs (and my hesitance to make definitive statements about how docker/docker-compose handles dependent dockerfiles), I still advocate for option #2. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #8824:
URL: https://github.com/apache/arrow/pull/8824


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#discussion_r538749024



##########
File path: .github/workflows/cpp.yml
##########
@@ -21,7 +21,7 @@ on:
   push:
     paths:
       - '.github/workflows/cpp.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/*cpp*'

Review comment:
       Sorry for the confusion: I agree with you and I do think that we should use 2 regardless, I side-tracked into a potential issue.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#issuecomment-871346705


   I think the safest to keep the current configuration with `docker/**` so we won't forget to update the workflow files if we change the image dependencies. A better approach could be to generate the workflow files from the `docker-compose.yml` but currently it may not worth the effort.
   
   I'm closing this PR for now, feel free to reopen if there is anything I missed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs closed pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
kszucs closed pull request #8824:
URL: https://github.com/apache/arrow/pull/8824


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#discussion_r538041597



##########
File path: .github/workflows/cpp.yml
##########
@@ -21,7 +21,7 @@ on:
   push:
     paths:
       - '.github/workflows/cpp.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/*cpp*'

Review comment:
       Sorry it took me so long to go through this comment. I actually think that this PR is correct, but it surfaces a bigger problem present in master.
   
   Let's say that we have an image X (typically build via `archery build X`), that depends on a `base` image (typically build via `archery build base`).
   
   Let's analyze what happens in the following 4 cases:
   
   * push to PR with no change to base image's dockerfile
   * push to master with no change to base image's dockerfile
   * push to PR with a change to base image's dockerfile
   * push to master with a change to base image's dockerfile
   
   Let's also call the hash of the current base image in the registry as `base_v1`.
   
   ## push to PR with no change to base image's dockerfile
   
   When a PR is open, image X is built using the dockerfile on the PR for that image, and it uses `base_v1` from the registry to base its build from.
   
   ## push to master with no change to base image's dockerfile
   
   Same as before.
   
   ## push to PR with a change to base image's dockerfile
   
   In this situation, the PR triggers two CI jobs: the build of image `X` and the build image `base`. Since these two jobs are independent workflows, image `X` will pick `base_v1`, while the base image job will build `base_v2`.
   
   This happens because `docker-compose run X` does not know that `X` depends on the compose job `base` to create the image `base`, and instead has that image available from the remote docker registry `${REPO}`.
   
   Because PRs do not push images to the registry, a new push to the PR will not alter this behavior: the image `X` will continue to be built based on `base_v1`. Thus, any error derived from building image `X` from the new image `base_v2` will not surface.
   
   ## push to master with a change to base image's dockerfile (e.g. PR is merged)
   
   In this situation, there is a race condition (undefined behavior): image X will be built using `base_v1` if it starts running before the flow `build image base` finishes (the likely scenario), or it picks `base_v2` if the other flow has finished.
   
   Thus, atm, whenever a PR changes both dockerfiles, `X` and `base`, the image X on the registry is based of `base_v1`, but `base:latest` points to `base_v2`.
   
   Thus, the pipeline will actually pass in master, even if the build of `X` using `base_v2` is incorrect(!). On the _next_ push to master that triggers `X` to be built, the error surfaces 🤯
   
   ## Summary
   
   In both cases where the dockerfile of `base` changes, that change is not propagated to the build of `X`. Therefore, IMO this PR is correct: changes to the `dockerfile` of image `base` do not require to trigger a change to the build of the image `X`, because there are no constraints in the jobs to build `X` to wait for the build of image `base`.
   
   This issue seems to indicate that our builds based of docker images that depend on other docker images that are build using `docker-compose build` may be too complex as there is hidden state through the registry.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#issuecomment-738178566


   @kszucs Can you take a look here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#discussion_r538041597



##########
File path: .github/workflows/cpp.yml
##########
@@ -21,7 +21,7 @@ on:
   push:
     paths:
       - '.github/workflows/cpp.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/*cpp*'

Review comment:
       Sorry it took me so long to go through this comment. I actually think that this PR is correct, but it surfaces a bigger problem present in master.
   
   Let's say that we have an image X (typically build via `archery build X`), that depends on a `base` image (typically build via `archery build base`).
   
   Let's analyze what happens in the following 4 cases:
   
   * push to PR with no change to base image's dockerfile
   * push to master with no change to base image's dockerfile
   * push to PR with a change to base image's dockerfile
   * push to master with a change to base image's dockerfile
   
   Let's also call the hash of the current base image in the registry as `base_v1`.
   
   ## push to PR with no change to base image's dockerfile
   
   When a PR is open, image X is built using the dockerfile on the PR for that image, and it uses `base_v1` from the registry to base its build from.
   
   ## push to master with no change to base image's dockerfile
   
   Same as before.
   
   ## push to PR with a change to base image's dockerfile
   
   In this situation, the PR triggers two CI jobs: the build of image `X` and the build image `base`. Since these two jobs are independent workflows, image `X` will pick `base_v1`, while the base image job will build `base_v2`.
   
   This happens because `docker-compose run X` does not know that `X` depends on the compose job `base` to create the image `base`, and instead has that image available from the remote docker registry `${REPO}`.
   
   Because PRs do not push images to the registry, a new push to the PR will not alter this behavior: the image `X` will continue to be built based on `base_v1`. Thus, any error derived from building image `X` from the new image `base_v2` will not surface.
   
   ## push to master with a change to base image's dockerfile (e.g. PR is merged)
   
   In this situation, there is a race condition (undefined behavior): image X will be built using `base_v1` if it starts running before the flow `build image base` finishes (the likely scenario), or it picks `base_v2` if the other flow has finished.
   
   Thus, atm, whenever a PR changes both dockerfiles, `X` and `base`, the image X on the registry is based of `base_v1`, but `base:latest` points to `base_v2`.
   
   Thus, even if the new dockerfile is incompatible with `base_v2`, the CI will actually pass in master because it picks `base_v1`. On the _next_ push to master that triggers `X` to be built, it will pick `base_v2`, and the error surfaces 🤯
   
   ## Summary
   
   In both cases where the dockerfile of `base` changes, that change is not propagated to the build of `X`. Therefore, IMO this PR is correct: changes to the `dockerfile` of image `base` do not require to trigger a change to the build of the image `X`, because there are no constraints in the jobs to build `X` to wait for the build of image `base`.
   
   This issue seems to indicate that our builds based of docker images that depend on other docker images that are build using `docker-compose build` may be too complex as there is hidden state through the registry.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#discussion_r535507642



##########
File path: .github/workflows/cpp.yml
##########
@@ -21,7 +21,7 @@ on:
   push:
     paths:
       - '.github/workflows/cpp.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/*cpp*'

Review comment:
       I think you need to add this one (or similar) to the R, Python, and Ruby jobs too. Their dockerfiles generally build on the cpp images (you can trace through the `base` parameter in docker-compose.yml, e.g. https://github.com/apache/arrow/blob/master/docker-compose.yml#L430)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#discussion_r538756496



##########
File path: .github/workflows/cpp.yml
##########
@@ -21,7 +21,7 @@ on:
   push:
     paths:
       - '.github/workflows/cpp.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/*cpp*'

Review comment:
       I just did a quick check to verify this, and AFAI can tell my hypothesis holds:
   
   ```docker-compose
   services:
     s1:
       image: t1
       build:
         context: .
         dockerfile: docker1.dockerfile
   
     s2:
       image: t2
       build:
         context: .
         dockerfile: docker2.dockerfile
   ```
   
   ```dockerfile
   # docker1.dockerfile
   FROM python:3.7.4
   
   CMD ls
   ```
   
   ```dockerfile
   # docker2.dockerfile
   FROM t1
   
   CMD ls
   ```
   
   ```bash
   docker-compose build s2
   
   Building s2
   Step 1/3 : FROM t1
   ERROR: Service 's2' failed to build : pull access denied for t1, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
   ```
   
   My understanding is that docker has no idea about `docker-compose` and how images get built; it only knows about where images are located, via registries. To declare such a dependency of image builds, Docker uses Multi-stage builds. Furthermore, docker-compose also does not declare image dependencies, only service dependencies (i.e. containers, not images).
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#discussion_r661392557



##########
File path: .github/workflows/cpp.yml
##########
@@ -21,7 +21,7 @@ on:
   push:
     paths:
       - '.github/workflows/cpp.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/*cpp*'

Review comment:
       Indeed, neither does docker nor docker-compose have an idea about the image hierarchy but archery does. 
   
   `archery build` and `archery run` commands are idempotent, meaning that the produced docker images are going to be the same for the same input configuration regardless of the state in the docker registry. 
   
   See the commands called underneath the `archery docker run` command:
   
   ```
   ❯ archery docker --dry-run run ubuntu-docs
   COMPOSE_DOCKER_CLI_BUILD=1 docker-compose pull --ignore-pull-failures ubuntu-cpp
   COMPOSE_DOCKER_CLI_BUILD=1 docker-compose pull --ignore-pull-failures ubuntu-python
   COMPOSE_DOCKER_CLI_BUILD=1 docker-compose pull --ignore-pull-failures ubuntu-docs
   COMPOSE_DOCKER_CLI_BUILD=1 docker-compose build --build-arg BUILDKIT_INLINE_CACHE=1 ubuntu-cpp
   COMPOSE_DOCKER_CLI_BUILD=1 docker-compose build --build-arg BUILDKIT_INLINE_CACHE=1 ubuntu-python
   COMPOSE_DOCKER_CLI_BUILD=1 docker-compose build --build-arg BUILDKIT_INLINE_CACHE=1 ubuntu-docs
   COMPOSE_DOCKER_CLI_BUILD=1 docker-compose run --rm ubuntu-docs
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8824: ARROW-10798: [CI] Made workflows be less triggered.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#issuecomment-738167777


   https://issues.apache.org/jira/browse/ARROW-10798


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#discussion_r538041597



##########
File path: .github/workflows/cpp.yml
##########
@@ -21,7 +21,7 @@ on:
   push:
     paths:
       - '.github/workflows/cpp.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/*cpp*'

Review comment:
       Sorry it took me so long to go through this comment. I actually think that this PR is correct, but it surfaces a bigger problem present in master.
   
   Let's say that we have an image X (typically build via `archery build X`), that depends on a `base` image (typically build via `archery build base`).
   
   Let's analyze what happens in the following 4 cases:
   
   * push to PR with no change to base image's dockerfile
   * push to master with no change to base image's dockerfile
   * push to PR with a change to base image's dockerfile
   * push to master with a change to base image's dockerfile
   
   Let's also call the hash of the current base image in the registry as `base_v1`.
   
   ## push to PR with no change to base image's dockerfile
   
   When a PR is open, image X is built using the dockerfile on the PR for that image, and it uses `base_v1` from the registry to base its build from.
   
   ## push to master with no change to base image's dockerfile
   
   Same as before.
   
   ## push to PR with a change to base image's dockerfile
   
   In this situation, the PR triggers two CI jobs: the build of image `X` and the build image `base`. Since these two jobs are independent workflows, image `X` will pick `base_v1`, while the base image job will build `base_v2`.
   
   This happens because `docker-compose run X` does not know that `X` depends on the compose job `base` to create the image `base`, and instead has that image available from the remote docker registry `${REPO}`.
   
   Because PRs do not push images to the registry, a new push to the PR will not alter this behavior: the image `X` will continue to be built based on `base_v1`. Thus, any error derived from building image `X` from the new image `base_v2` will not surface.
   
   ## push to master with a change to base image's dockerfile (e.g. PR is merged)
   
   In this situation, there is a race condition (undefined behavior): image X will be built using `base_v1` if it starts running before the flow `build image base` finishes (the likely scenario), or it picks `base_v2` if the other flow has finished.
   
   Thus, atm, whenever a PR changes both dockerfiles, `X` and `base`, the image X on the registry is based of `base_v1`, but `base:latest` points to `base_v2`.
   
   Thus, even if the new dockerfile is incompatible with `base_v2`, the CI will actually pass in master. On the _next_ push to master that triggers `X` to be built, it will pick `base_v2`, and the error surfaces 🤯
   
   ## Summary
   
   In both cases where the dockerfile of `base` changes, that change is not propagated to the build of `X`. Therefore, IMO this PR is correct: changes to the `dockerfile` of image `base` do not require to trigger a change to the build of the image `X`, because there are no constraints in the jobs to build `X` to wait for the build of image `base`.
   
   This issue seems to indicate that our builds based of docker images that depend on other docker images that are build using `docker-compose build` may be too complex as there is hidden state through the registry.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#issuecomment-738178051


   ccing @pitrou , since I have now added cpp to the mix.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#discussion_r661396352



##########
File path: .github/workflows/ruby.yml
##########
@@ -21,7 +21,7 @@ on:
   push:
     paths:
       - '.github/workflows/ruby.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/linux-apt-ruby.dockerfile'

Review comment:
       We would need to list all dockerfiles used in the parent images, in this case:
   
   ```
   ci/docker/ubuntu-18.04-cpp.dockerfile
   ci/docker/linux-apt-c-glib.dockerfile
   ci/docker/linux-apt-ruby.dockerfile
   ```
   
   though if we wan't to run `UBUNTU=20.04 archery docker run ubuntu-ruby` then:
   
   ```
   ci/docker/ubuntu-20.04-cpp.dockerfile
   ci/docker/linux-apt-c-glib.dockerfile
   ci/docker/linux-apt-ruby.dockerfile
   ```
   
   This hierarchy is encoded in the [docker-compose.yml](https://github.com/apache/arrow/blob/master/docker-compose.yml#L69) depending on the env [variables as well](https://github.com/apache/arrow/blob/master/docker-compose.yml#L542).
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org