You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/19 17:13:45 UTC

[GitHub] [airflow] potiuk opened a new pull request #20407: Add ADR describing reasoning why we build images and security of it

potiuk opened a new pull request #20407:
URL: https://github.com/apache/airflow/pull/20407


   The ADRs document the decision why Docker imaages are used as common
   environment for CI and development environment, and also why building
   images should be done in a secure way.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #20407:
URL: https://github.com/apache/airflow/pull/20407


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r772593480



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same
+image as the environment, which in extreme cases would mean 20x10 = 200 build minutes on CI to
+just rebuild the same image.
+
+Therefore, there is a need to use the process that will allow to build the images once and share it with
+all the test jobs that need it. Another advantage of having such image is that since the image is stored
+in the registry with "commit" tag which allows to easily reproduce the environment used in particular
+build on CI. This is a nice side effect of such setup, one that can be useful in case a user does not want
+to lose time on checking out and rebuilding the environment locally for a build that comes from their own,
+or another developer's PR.
+
+This requires those prerequisites:
+
+  * the images need to be built in a workflow that has "write" access to store the images after they are
+    built, so that the images can then be "pulled" by the test jobs rather than rebuilt
+
+  * the process to build the images need to be secured from malicious users that would like to inject a
+    code in the build process to make bad use of the "write" access - for example to push the code
+    to the repository or to inject malicious code to "common" artifacts used by the jobs
+
+  * however, in order to build the images that reflect the PR of the user, they should be able to modify some
+    code that is usually used to build airflow packages (`setup.py`, airflow sources, scripts).
+
+GitHub Actions provide some features that we can use for that purpose:
+
+* there isa "pull request target" workflow that uses only the code present in the protected "main" version
+  of the code even if the code is modified in the PR. That code also has access to secrets stored in
+  Airflow repository (for example in our case secret used to push documentation to S3 Bucket). Those
+  secrets should not be made available to user code coming from PR.
+
+* this "pull request target" workflow can have granular "write" permissions assigned, so that each job
+  can be granted access to certain resources only
+
+* however, they are ways (using some GitHub Actions) to inject more permissions - for example
+  when "checkout" is performed by default using GitHub Actions checkout command by default the checked-out
+  repository has "write" access and any of the further steps in the job can "push" using this repository
+  (see https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status#GitHubActionsstatus-Security)
+  for details.
+
+* when using 3rd-party actions, you need to "pin" the actions to specific COMMIT SHA versions because
+  there is a risk, a 3rd-party might inject a code to your workflow by releasing and tagging new version
+  of their actions: https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions
+
+* some code that should be executed "inside" of the images when building should be possible to come from the
+  PR - for example code that is used inside the docker image to install mysql or postgres should be possible
+  to be changed in the PR - as long as we make sure the code is executed inside the Docker image or Docker
+  build process.
+
+Those protections are gradually strengthened (up until recently there was no granular access rights possible)
+however we decided to add certain rules of the "build" code that is executed in our GitHub "Build Images"
+"pull request target" to make sure
+
+## Decision
+
+The decision of our use of GitHub images is to utilise "Pull Request Workflow" to build the shared image,
+but to make sure that the following rules are in-place:
+
+1) We always use `persist-credentials: false` in all GitHub Action checkouts, to prevent unauthorized pushes
+   to our repository

Review comment:
       PR ready @eladkal :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r772266375



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same
+image as the environment, which in extreme cases would mean 20x10 = 200 build minutes on CI to
+just rebuild the same image.
+
+Therefore, there is a need to use the process that will allow to build the images once and share it with
+all the test jobs that need it. Another advantage of having such image is that since the image is stored
+in the registry with "commit" tag which allows to easily reproduce the environment used in particular
+build on CI. This is a nice side effect of such setup, one that can be useful in case a user does not want
+to lose time on checking out and rebuilding the environment locally for a build that comes from their own,
+or another developer's PR.
+
+This requires those prerequisites:
+
+  * the images need to be built in a workflow that has "write" access to store the images after they are
+    built, so that the images can then be "pulled" by the test jobs rather than rebuilt
+
+  * the process to build the images need to be secured from malicious users that would like to inject a
+    code in the build process to make bad use of the "write" access - for example to push the code
+    to the repository or to inject malicious code to "common" artifacts used by the jobs
+
+  * however, in order to build the images that reflect the PR of the user, they should be able to modify some
+    code that is usually used to build airflow packages (`setup.py`, airflow sources, scripts).
+
+GitHub Actions provide some features that we can use for that purpose:
+
+* there isa "pull request target" workflow that uses only the code present in the protected "main" version
+  of the code even if the code is modified in the PR. That code also has access to secrets stored in
+  Airflow repository (for example in our case secret used to push documentation to S3 Bucket). Those
+  secrets should not be made available to user code coming from PR.
+
+* this "pull request target" workflow can have granular "write" permissions assigned, so that each job
+  can be granted access to certain resources only
+
+* however, they are ways (using some GitHub Actions) to inject more permissions - for example
+  when "checkout" is performed by default using GitHub Actions checkout command by default the checked-out
+  repository has "write" access and any of the further steps in the job can "push" using this repository
+  (see https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status#GitHubActionsstatus-Security)
+  for details.
+
+* when using 3rd-party actions, you need to "pin" the actions to specific COMMIT SHA versions because
+  there is a risk, a 3rd-party might inject a code to your workflow by releasing and tagging new version
+  of their actions: https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions
+
+* some code that should be executed "inside" of the images when building should be possible to come from the
+  PR - for example code that is used inside the docker image to install mysql or postgres should be possible
+  to be changed in the PR - as long as we make sure the code is executed inside the Docker image or Docker
+  build process.
+
+Those protections are gradually strengthened (up until recently there was no granular access rights possible)
+however we decided to add certain rules of the "build" code that is executed in our GitHub "Build Images"
+"pull request target" to make sure
+
+## Decision
+
+The decision of our use of GitHub images is to utilise "Pull Request Workflow" to build the shared image,
+but to make sure that the following rules are in-place:
+
+1) We always use `persist-credentials: false` in all GitHub Action checkouts, to prevent unauthorized pushes
+   to our repository

Review comment:
       That's a good point. We could make a pre-commit check for it likely. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Bowrna commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
Bowrna commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r773868799



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same
+image as the environment, which in extreme cases would mean 20x10 = 200 build minutes on CI to
+just rebuild the same image.
+
+Therefore, there is a need to use the process that will allow to build the images once and share it with
+all the test jobs that need it. Another advantage of having such image is that since the image is stored
+in the registry with "commit" tag which allows to easily reproduce the environment used in particular
+build on CI. This is a nice side effect of such setup, one that can be useful in case a user does not want
+to lose time on checking out and rebuilding the environment locally for a build that comes from their own,
+or another developer's PR.
+
+This requires those prerequisites:
+
+  * the images need to be built in a workflow that has "write" access to store the images after they are
+    built, so that the images can then be "pulled" by the test jobs rather than rebuilt
+
+  * the process to build the images need to be secured from malicious users that would like to inject a
+    code in the build process to make bad use of the "write" access - for example to push the code
+    to the repository or to inject malicious code to "common" artifacts used by the jobs
+
+  * however, in order to build the images that reflect the PR of the user, they should be able to modify some
+    code that is usually used to build airflow packages (`setup.py`, airflow sources, scripts).
+
+GitHub Actions provide some features that we can use for that purpose:
+
+* there is a "pull request target" workflow that uses only the code present in the protected "main" version
+  of the code even if the code is modified in the PR. That code also has access to secrets stored in
+  Airflow repository (for example in our case secret used to push documentation to S3 Bucket). Those
+  secrets should not be made available to user code coming from PR.
+
+* this "pull request target" workflow can have granular "write" permissions assigned, so that each job
+  can be granted access to certain resources only
+
+* however, they are ways (using some GitHub Actions) to inject more permissions - for example
+  when "checkout" is performed by default using GitHub Actions checkout command by default the checked-out
+  repository has "write" access and any of the further steps in the job can "push" using this repository
+  (see https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status#GitHubActionsstatus-Security)
+  for details.
+
+* when using 3rd-party actions, you need to "pin" the actions to specific COMMIT SHA versions because
+  there is a risk, a 3rd-party might inject a code to your workflow by releasing and tagging new version
+  of their actions: https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions
+
+* some code that should be executed "inside" of the images when building should be possible to come from the
+  PR - for example code that is used inside the docker image to install mysql or postgres should be possible
+  to be changed in the PR - as long as we make sure the code is executed inside the Docker image or Docker
+  build process.
+
+Those protections are gradually strengthened (up until recently there was no granular access rights possible)
+however we decided to add certain rules of the "build" code that is executed in our GitHub "Build Images"
+"pull request target" to make sure
+
+## Decision
+
+The decision of our use of GitHub images is to utilise "Pull Request Workflow" to build the shared image,
+but to make sure that the following rules are in-place:
+
+1) We always use `persist-credentials: false` in all GitHub Action checkouts, to prevent unauthorized pushes
+   to our repository
+
+```yaml
+      - uses: actions/checkout@v2
+        with:
+          ref: ${{ env.TARGET_COMMIT_SHA }}
+          persist-credentials: false
+          fetch-depth: 2
+```
+
+2) we use submodules (in .github/actions) where we keep actions that we are using (except of the standard
+   GitHub managed actions). Submodules provide few features such as - automated linking to specific commit
+   SHA (not tag) and integration with Pull Request Review process when someone creates a PR to upgrade the
+   action, which makes it ideal to securely and seamlessly keep the action updated if needed.
+
+3) No user code coming from the PR can be executed directly in the "Build image" workflow. For example, the
+   build scripts should not import `setup.py` or execute bash scripts coming from other places than:
+   * scripts/ci
+   * dev/
+   All the other sources are taken from the PR, but those two folders, during the "Build image" workflow
+   are overwritten by the `main` version of the scripts. This means that it is safe to execute those
+   scripts in the "Build Image" workflow.
+
+4) All the other code coming from the PR can only be executed inside the docker container started by the
+   `scripts/ci` or `dev` scripts. The docker containers should not have any volumes mounted from
+   the host that will enable them to read or modify values, environment variables that are present in the
+   Host CI environment.
+
+5) The "docker build" commands automatically execute Dockerfile commands inside such a container, so there
+   is no risk that sensitive information from the host will be passed to them.
+
+6) In case any information is needed from the "sources" of Airflow (such as name of the current branch or
+   version of airflow) it should be extracted by parsing of the incoming scripts but not executing them. Under
+   any circumstances any of the scripts coming from the outside of `scripts/ci` and `dev` should be
+   executed on the host during the image building process.

Review comment:
       @potiuk Under any circumstances any of the scripts coming from the outside of scripts/ci and dev should **not** be executed on the host during the image building process.
   Am i right? Not is missing, which will mean you allow any scripts to be executed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#issuecomment-997428703


   cc: @bowrna @edithturn  @eladkal  @xurro - those are the next ADRs describing decisions behind why we are using docker images for our CI/developement as part of some clarifications and answering some questions from @Bowrna  and part of our 
   
   I tried to capture two things:
   * why we really need the images and the "smaller" projects approach with virtualenv won't work
   * what are the decisions around security and building the images in a separate workflow  @ashb It woudl be great if you take a look if got all security aspects right. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#issuecomment-997451067


   > @potiuk. From what I see our CI workflows, not only involve Python dependencies but bash script(which we are trying to change), tests, database, document validation. I understand that to build all dependencies only in Python virtualenv enough, for the CI of other components we need docker to replicate the real environment in small containers that are also reusable.
   
   For the smaller scripts we will not need docker, virtualenv will be enough. The thing is that our small python scripts will only use "small" set of dependencies which are ok to have their own virtualenv. Those scripts will come from `dev` folder and we make sure that the scripts are taken from`main` and not from PR.  So we will not use Docker for those. 
   
   For example you can see here:
   
   https://github.com/apache/airflow/blob/4ac35d723b73d02875d56bf000aafd2235ef0f4a/.github/workflows/ci.yml#L240
   
   This is the action that is runs pytest tests of our "Breeze2".
   
   What it does:
   * it sets working directory to ".dev/breeze"
   * it sets-up python in version 3.7 with dependency cache that understands pip install
   * It installs dependencies specified in "./dev/breeze"' setup.* files
   * and runs tests afterwards
   
   This will be very similar pattern for most of our jobs that do not need full "550" dependencies of airflow installed. In our case the CI image is really needed to run airflow tests, to run mypy and flake and few other things (for example building providers - which also needs to be done inside the container for security).
   
   Most of the other scripts of ours (free space for example) are perfectly fine to use the virtualenv support provided by GithubActions like below: 
   
   ```
     run-new-breeze-tests:
       timeout-minutes: 10
       name: Breeze2 tests
       runs-on: ${{ fromJson(needs.build-info.outputs.runsOn) }}
       needs: [build-info]
       defaults:
         run:
           shell: bash
           working-directory: ./dev/breeze
       steps:
         - uses: actions/checkout@v2
           with:
             persist-credentials: false
         - uses: actions/setup-python@v2
           with:
             python-version: '3.7'
             cache: 'pip'
         - run: pip install -e .
         - run: python3 -m pytest -n auto --color=yes
   ```
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#issuecomment-997981631


   @uranusjr and @ashb as you were involved in the CI scripting comments and the ADR's are mainly for the future/reasoning I'd also love your comments before merging that one. 
   
   I will keep on updating the ADRs as we progress with the implementation of the Breeze rewrite and more questions will be asked by @edithturn and @Bowrna  - this is actually  agreat opportunity to capture all the reasoning behind many of the decisions made :D.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r772649257



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same
+image as the environment, which in extreme cases would mean 20x10 = 200 build minutes on CI to
+just rebuild the same image.
+
+Therefore, there is a need to use the process that will allow to build the images once and share it with
+all the test jobs that need it. Another advantage of having such image is that since the image is stored
+in the registry with "commit" tag which allows to easily reproduce the environment used in particular
+build on CI. This is a nice side effect of such setup, one that can be useful in case a user does not want
+to lose time on checking out and rebuilding the environment locally for a build that comes from their own,
+or another developer's PR.
+
+This requires those prerequisites:
+
+  * the images need to be built in a workflow that has "write" access to store the images after they are
+    built, so that the images can then be "pulled" by the test jobs rather than rebuilt
+
+  * the process to build the images need to be secured from malicious users that would like to inject a
+    code in the build process to make bad use of the "write" access - for example to push the code
+    to the repository or to inject malicious code to "common" artifacts used by the jobs
+
+  * however, in order to build the images that reflect the PR of the user, they should be able to modify some
+    code that is usually used to build airflow packages (`setup.py`, airflow sources, scripts).
+
+GitHub Actions provide some features that we can use for that purpose:
+
+* there isa "pull request target" workflow that uses only the code present in the protected "main" version
+  of the code even if the code is modified in the PR. That code also has access to secrets stored in
+  Airflow repository (for example in our case secret used to push documentation to S3 Bucket). Those
+  secrets should not be made available to user code coming from PR.
+
+* this "pull request target" workflow can have granular "write" permissions assigned, so that each job
+  can be granted access to certain resources only
+
+* however, they are ways (using some GitHub Actions) to inject more permissions - for example
+  when "checkout" is performed by default using GitHub Actions checkout command by default the checked-out
+  repository has "write" access and any of the further steps in the job can "push" using this repository
+  (see https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status#GitHubActionsstatus-Security)
+  for details.
+
+* when using 3rd-party actions, you need to "pin" the actions to specific COMMIT SHA versions because
+  there is a risk, a 3rd-party might inject a code to your workflow by releasing and tagging new version
+  of their actions: https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions
+
+* some code that should be executed "inside" of the images when building should be possible to come from the
+  PR - for example code that is used inside the docker image to install mysql or postgres should be possible
+  to be changed in the PR - as long as we make sure the code is executed inside the Docker image or Docker
+  build process.
+
+Those protections are gradually strengthened (up until recently there was no granular access rights possible)
+however we decided to add certain rules of the "build" code that is executed in our GitHub "Build Images"
+"pull request target" to make sure
+
+## Decision
+
+The decision of our use of GitHub images is to utilise "Pull Request Workflow" to build the shared image,
+but to make sure that the following rules are in-place:
+
+1) We always use `persist-credentials: false` in all GitHub Action checkouts, to prevent unauthorized pushes
+   to our repository

Review comment:
       thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r773837262



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same

Review comment:
       Yep. You can see the jobs in .github/workflows/ci.yml  - and when you create PR you will see the red/green status in 'checks' tab - every such red/green stuff is a job. For example 'tests postres python 3.7' - is an example of a job.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#issuecomment-998768568


   Merging for now - those ADRs are still Draft, so we can always modify them later. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#issuecomment-997925844


   The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r773837772



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same

Review comment:
       https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#jobs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r771984937



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same
+image as the environment, which in extreme cases would mean 20x10 = 200 build minutes on CI to
+just rebuild the same image.
+
+Therefore, there is a need to use the process that will allow to build the images once and share it with
+all the test jobs that need it. Another advantage of having such image is that since the image is stored
+in the registry with "commit" tag which allows to easily reproduce the environment used in particular
+build on CI. This is a nice side effect of such setup, one that can be useful in case a user does not want
+to lose time on checking out and rebuilding the environment locally for a build that comes from their own,
+or another developer's PR.
+
+This requires those prerequisites:
+
+  * the images need to be built in a workflow that has "write" access to store the images after they are
+    built, so that the images can then be "pulled" by the test jobs rather than rebuilt
+
+  * the process to build the images need to be secured from malicious users that would like to inject a
+    code in the build process to make bad use of the "write" access - for example to push the code
+    to the repository or to inject malicious code to "common" artifacts used by the jobs
+
+  * however, in order to build the images that reflect the PR of the user, they should be able to modify some
+    code that is usually used to build airflow packages (`setup.py`, airflow sources, scripts).
+
+GitHub Actions provide some features that we can use for that purpose:
+
+* there isa "pull request target" workflow that uses only the code present in the protected "main" version

Review comment:
       ```suggestion
   * there is a "pull request target" workflow that uses only the code present in the protected "main" version
   ```

##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same
+image as the environment, which in extreme cases would mean 20x10 = 200 build minutes on CI to
+just rebuild the same image.
+
+Therefore, there is a need to use the process that will allow to build the images once and share it with
+all the test jobs that need it. Another advantage of having such image is that since the image is stored
+in the registry with "commit" tag which allows to easily reproduce the environment used in particular
+build on CI. This is a nice side effect of such setup, one that can be useful in case a user does not want
+to lose time on checking out and rebuilding the environment locally for a build that comes from their own,
+or another developer's PR.
+
+This requires those prerequisites:
+
+  * the images need to be built in a workflow that has "write" access to store the images after they are
+    built, so that the images can then be "pulled" by the test jobs rather than rebuilt
+
+  * the process to build the images need to be secured from malicious users that would like to inject a
+    code in the build process to make bad use of the "write" access - for example to push the code
+    to the repository or to inject malicious code to "common" artifacts used by the jobs
+
+  * however, in order to build the images that reflect the PR of the user, they should be able to modify some
+    code that is usually used to build airflow packages (`setup.py`, airflow sources, scripts).
+
+GitHub Actions provide some features that we can use for that purpose:
+
+* there isa "pull request target" workflow that uses only the code present in the protected "main" version
+  of the code even if the code is modified in the PR. That code also has access to secrets stored in
+  Airflow repository (for example in our case secret used to push documentation to S3 Bucket). Those
+  secrets should not be made available to user code coming from PR.
+
+* this "pull request target" workflow can have granular "write" permissions assigned, so that each job
+  can be granted access to certain resources only
+
+* however, they are ways (using some GitHub Actions) to inject more permissions - for example
+  when "checkout" is performed by default using GitHub Actions checkout command by default the checked-out
+  repository has "write" access and any of the further steps in the job can "push" using this repository
+  (see https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status#GitHubActionsstatus-Security)
+  for details.
+
+* when using 3rd-party actions, you need to "pin" the actions to specific COMMIT SHA versions because
+  there is a risk, a 3rd-party might inject a code to your workflow by releasing and tagging new version
+  of their actions: https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions
+
+* some code that should be executed "inside" of the images when building should be possible to come from the
+  PR - for example code that is used inside the docker image to install mysql or postgres should be possible
+  to be changed in the PR - as long as we make sure the code is executed inside the Docker image or Docker
+  build process.
+
+Those protections are gradually strengthened (up until recently there was no granular access rights possible)
+however we decided to add certain rules of the "build" code that is executed in our GitHub "Build Images"
+"pull request target" to make sure
+
+## Decision
+
+The decision of our use of GitHub images is to utilise "Pull Request Workflow" to build the shared image,
+but to make sure that the following rules are in-place:
+
+1) We always use `persist-credentials: false` in all GitHub Action checkouts, to prevent unauthorized pushes
+   to our repository

Review comment:
       Can this be enforced? Should we implement some kind of verification for this? (rather not relay on remembering)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Bowrna commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
Bowrna commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r773802437



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same
+image as the environment, which in extreme cases would mean 20x10 = 200 build minutes on CI to
+just rebuild the same image.
+
+Therefore, there is a need to use the process that will allow to build the images once and share it with
+all the test jobs that need it. Another advantage of having such image is that since the image is stored
+in the registry with "commit" tag which allows to easily reproduce the environment used in particular
+build on CI. This is a nice side effect of such setup, one that can be useful in case a user does not want
+to lose time on checking out and rebuilding the environment locally for a build that comes from their own,
+or another developer's PR.
+
+This requires those prerequisites:
+
+  * the images need to be built in a workflow that has "write" access to store the images after they are
+    built, so that the images can then be "pulled" by the test jobs rather than rebuilt
+
+  * the process to build the images need to be secured from malicious users that would like to inject a
+    code in the build process to make bad use of the "write" access - for example to push the code
+    to the repository or to inject malicious code to "common" artifacts used by the jobs
+
+  * however, in order to build the images that reflect the PR of the user, they should be able to modify some
+    code that is usually used to build airflow packages (`setup.py`, airflow sources, scripts).
+
+GitHub Actions provide some features that we can use for that purpose:
+
+* there is a "pull request target" workflow that uses only the code present in the protected "main" version
+  of the code even if the code is modified in the PR. That code also has access to secrets stored in
+  Airflow repository (for example in our case secret used to push documentation to S3 Bucket). Those
+  secrets should not be made available to user code coming from PR.
+
+* this "pull request target" workflow can have granular "write" permissions assigned, so that each job
+  can be granted access to certain resources only
+
+* however, they are ways (using some GitHub Actions) to inject more permissions - for example
+  when "checkout" is performed by default using GitHub Actions checkout command by default the checked-out
+  repository has "write" access and any of the further steps in the job can "push" using this repository
+  (see https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status#GitHubActionsstatus-Security)
+  for details.
+
+* when using 3rd-party actions, you need to "pin" the actions to specific COMMIT SHA versions because
+  there is a risk, a 3rd-party might inject a code to your workflow by releasing and tagging new version
+  of their actions: https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions
+
+* some code that should be executed "inside" of the images when building should be possible to come from the
+  PR - for example code that is used inside the docker image to install mysql or postgres should be possible
+  to be changed in the PR - as long as we make sure the code is executed inside the Docker image or Docker
+  build process.
+
+Those protections are gradually strengthened (up until recently there was no granular access rights possible)
+however we decided to add certain rules of the "build" code that is executed in our GitHub "Build Images"
+"pull request target" to make sure
+
+## Decision
+
+The decision of our use of GitHub images is to utilise "Pull Request Workflow" to build the shared image,
+but to make sure that the following rules are in-place:
+
+1) We always use `persist-credentials: false` in all GitHub Action checkouts, to prevent unauthorized pushes
+   to our repository
+
+```yaml
+      - uses: actions/checkout@v2
+        with:
+          ref: ${{ env.TARGET_COMMIT_SHA }}
+          persist-credentials: false
+          fetch-depth: 2
+```
+
+2) we use submodules (in .github/actions) where we keep actions that we are using (except of the standard
+   GitHub managed actions). Submodules provide few features such as - automated linking to specific commit
+   SHA (not tag) and integration with Pull Request Review process when someone creates a PR to upgrade the
+   action, which makes it ideal to securely and seamlessly keep the action updated if needed.
+
+3) No user code coming from the PR can be executed directly in the "Build image" workflow. For example, the
+   build scripts should not import `setup.py` or execute bash scripts coming from other places than:
+   * scripts/ci
+   * dev/
+   All the other sources are taken from the PR, but those two folders, during the "Build image" workflow
+   are overwritten by the `main` version of the scripts. This means that it is safe to execute those
+   scripts in the "Build Image" workflow.
+
+4) All the other code coming from the PR can only be executed inside the docker container started by the
+   `scripts/ci` or `dev` scripts. The docker containers should not have any volumes mounted from
+   the host that will enable them to read or modify values, environment variables that are present in the
+   Host CI environment.
+
+5) The "docker build" commands automatically execute Dockerfile commands inside such a container, so there
+   is no risk that sensitive information from the host will be passed to them.
+
+6) In case any information is needed from the "sources" of Airflow (such as name of the current branch or
+   version of airflow) it should be extracted by parsing of the incoming scripts but not executing them. Under
+   any circumstances any of the scripts coming from the outside of `scripts/ci` and `dev` should be
+   executed on the host during the image building process.

Review comment:
       Under any circumstances any of the scripts coming from the outside of `scripts/ci` and `dev` should be
      executed on the host during the image building process.
   
   Is the above line correct?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#issuecomment-997451557


   > Here is my question, if we have docker containers for airflow or breeze, then these are displayed on the GitHub Actions environment? Where the build and test take place. For these ADRs, what images are you referring to? :)
   > Thanks in advance
   
   All the details on where the images are build and set are described in https://github.com/apache/airflow/blob/main/CI.rst#ci-sequence-diagrams . In short we have two workflows (in https://github.com/apache/airflow/tree/main/.github/workflows) :
   
   * build-images.yml - > this is the "pull_request_target" workflow that builds and pushes the images to "ghcr.io" (write access)
   * ci.yml - > this is the "pull request" workflow that executes tests, provider build, etc. etc. - some of the jobs there require the "CI images" and they are pulled before tests are executed for example. This is "read-only" workflow.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Bowrna commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
Bowrna commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r773774181



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same

Review comment:
       @potiuk could you explain me what jobs mean in this case?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r772593374



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same
+image as the environment, which in extreme cases would mean 20x10 = 200 build minutes on CI to
+just rebuild the same image.
+
+Therefore, there is a need to use the process that will allow to build the images once and share it with
+all the test jobs that need it. Another advantage of having such image is that since the image is stored
+in the registry with "commit" tag which allows to easily reproduce the environment used in particular
+build on CI. This is a nice side effect of such setup, one that can be useful in case a user does not want
+to lose time on checking out and rebuilding the environment locally for a build that comes from their own,
+or another developer's PR.
+
+This requires those prerequisites:
+
+  * the images need to be built in a workflow that has "write" access to store the images after they are
+    built, so that the images can then be "pulled" by the test jobs rather than rebuilt
+
+  * the process to build the images need to be secured from malicious users that would like to inject a
+    code in the build process to make bad use of the "write" access - for example to push the code
+    to the repository or to inject malicious code to "common" artifacts used by the jobs
+
+  * however, in order to build the images that reflect the PR of the user, they should be able to modify some
+    code that is usually used to build airflow packages (`setup.py`, airflow sources, scripts).
+
+GitHub Actions provide some features that we can use for that purpose:
+
+* there isa "pull request target" workflow that uses only the code present in the protected "main" version
+  of the code even if the code is modified in the PR. That code also has access to secrets stored in
+  Airflow repository (for example in our case secret used to push documentation to S3 Bucket). Those
+  secrets should not be made available to user code coming from PR.
+
+* this "pull request target" workflow can have granular "write" permissions assigned, so that each job
+  can be granted access to certain resources only
+
+* however, they are ways (using some GitHub Actions) to inject more permissions - for example
+  when "checkout" is performed by default using GitHub Actions checkout command by default the checked-out
+  repository has "write" access and any of the further steps in the job can "push" using this repository
+  (see https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status#GitHubActionsstatus-Security)
+  for details.
+
+* when using 3rd-party actions, you need to "pin" the actions to specific COMMIT SHA versions because
+  there is a risk, a 3rd-party might inject a code to your workflow by releasing and tagging new version
+  of their actions: https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions
+
+* some code that should be executed "inside" of the images when building should be possible to come from the
+  PR - for example code that is used inside the docker image to install mysql or postgres should be possible
+  to be changed in the PR - as long as we make sure the code is executed inside the Docker image or Docker
+  build process.
+
+Those protections are gradually strengthened (up until recently there was no granular access rights possible)
+however we decided to add certain rules of the "build" code that is executed in our GitHub "Build Images"
+"pull request target" to make sure
+
+## Decision
+
+The decision of our use of GitHub images is to utilise "Pull Request Workflow" to build the shared image,
+but to make sure that the following rules are in-place:
+
+1) We always use `persist-credentials: false` in all GitHub Action checkouts, to prevent unauthorized pushes
+   to our repository

Review comment:
       https://github.com/apache/airflow/pull/20430




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r773839641



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same
+image as the environment, which in extreme cases would mean 20x10 = 200 build minutes on CI to
+just rebuild the same image.
+
+Therefore, there is a need to use the process that will allow to build the images once and share it with
+all the test jobs that need it. Another advantage of having such image is that since the image is stored
+in the registry with "commit" tag which allows to easily reproduce the environment used in particular
+build on CI. This is a nice side effect of such setup, one that can be useful in case a user does not want
+to lose time on checking out and rebuilding the environment locally for a build that comes from their own,
+or another developer's PR.
+
+This requires those prerequisites:
+
+  * the images need to be built in a workflow that has "write" access to store the images after they are
+    built, so that the images can then be "pulled" by the test jobs rather than rebuilt
+
+  * the process to build the images need to be secured from malicious users that would like to inject a
+    code in the build process to make bad use of the "write" access - for example to push the code
+    to the repository or to inject malicious code to "common" artifacts used by the jobs
+
+  * however, in order to build the images that reflect the PR of the user, they should be able to modify some
+    code that is usually used to build airflow packages (`setup.py`, airflow sources, scripts).
+
+GitHub Actions provide some features that we can use for that purpose:
+
+* there is a "pull request target" workflow that uses only the code present in the protected "main" version
+  of the code even if the code is modified in the PR. That code also has access to secrets stored in
+  Airflow repository (for example in our case secret used to push documentation to S3 Bucket). Those
+  secrets should not be made available to user code coming from PR.
+
+* this "pull request target" workflow can have granular "write" permissions assigned, so that each job
+  can be granted access to certain resources only
+
+* however, they are ways (using some GitHub Actions) to inject more permissions - for example
+  when "checkout" is performed by default using GitHub Actions checkout command by default the checked-out
+  repository has "write" access and any of the further steps in the job can "push" using this repository
+  (see https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status#GitHubActionsstatus-Security)
+  for details.
+
+* when using 3rd-party actions, you need to "pin" the actions to specific COMMIT SHA versions because
+  there is a risk, a 3rd-party might inject a code to your workflow by releasing and tagging new version
+  of their actions: https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions
+
+* some code that should be executed "inside" of the images when building should be possible to come from the
+  PR - for example code that is used inside the docker image to install mysql or postgres should be possible
+  to be changed in the PR - as long as we make sure the code is executed inside the Docker image or Docker
+  build process.
+
+Those protections are gradually strengthened (up until recently there was no granular access rights possible)
+however we decided to add certain rules of the "build" code that is executed in our GitHub "Build Images"
+"pull request target" to make sure
+
+## Decision
+
+The decision of our use of GitHub images is to utilise "Pull Request Workflow" to build the shared image,
+but to make sure that the following rules are in-place:
+
+1) We always use `persist-credentials: false` in all GitHub Action checkouts, to prevent unauthorized pushes
+   to our repository
+
+```yaml
+      - uses: actions/checkout@v2
+        with:
+          ref: ${{ env.TARGET_COMMIT_SHA }}
+          persist-credentials: false
+          fetch-depth: 2
+```
+
+2) we use submodules (in .github/actions) where we keep actions that we are using (except of the standard
+   GitHub managed actions). Submodules provide few features such as - automated linking to specific commit
+   SHA (not tag) and integration with Pull Request Review process when someone creates a PR to upgrade the
+   action, which makes it ideal to securely and seamlessly keep the action updated if needed.
+
+3) No user code coming from the PR can be executed directly in the "Build image" workflow. For example, the
+   build scripts should not import `setup.py` or execute bash scripts coming from other places than:
+   * scripts/ci
+   * dev/
+   All the other sources are taken from the PR, but those two folders, during the "Build image" workflow
+   are overwritten by the `main` version of the scripts. This means that it is safe to execute those
+   scripts in the "Build Image" workflow.
+
+4) All the other code coming from the PR can only be executed inside the docker container started by the
+   `scripts/ci` or `dev` scripts. The docker containers should not have any volumes mounted from
+   the host that will enable them to read or modify values, environment variables that are present in the
+   Host CI environment.
+
+5) The "docker build" commands automatically execute Dockerfile commands inside such a container, so there
+   is no risk that sensitive information from the host will be passed to them.
+
+6) In case any information is needed from the "sources" of Airflow (such as name of the current branch or
+   version of airflow) it should be extracted by parsing of the incoming scripts but not executing them. Under
+   any circumstances any of the scripts coming from the outside of `scripts/ci` and `dev` should be
+   executed on the host during the image building process.

Review comment:
       Yep. This is advanced security stuff for our 'build' jobs. We really need to make sure that we only execute anything in the build.yml that is coming from one of those two folders only. Because all the rest can be modified by the users making PR and they might get access to stuff we do not want them to.
   
   What we do in our workflows, we make sure that those two folders (for now only scripts/ci) are not coming from the PR of the user but from 'main' branch of our repository 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#issuecomment-997428703


   cc: @bowrna @edithturn  @eladkal  @xurro - those are the next ADRs describing decisions behind why we are using docker images for our CI/developement as part of some clarifications and answering some questions from @Bowrna  and part of our rewrite project: https://github.com/apache/airflow/projects/13
   
   I tried to capture two things:
   * why we really need the images and the "smaller" projects approach with virtualenv won't work
   * what are the decisions around security and building the images in a separate workflow  @ashb It woudl be great if you take a look if got all security aspects right. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edithturn commented on pull request #20407: Add ADR describing reasoning why we build images and security of it

Posted by GitBox <gi...@apache.org>.
edithturn commented on pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#issuecomment-997447747


   @potiuk. From what I see our CI workflows, not only involve Python dependencies but bash script(which we are trying to change), tests, database, document validation. I understand that to build all dependencies only in Python virtualenv enough, for the CI of other components we need docker to replicate the real environment in small containers that are also reusable.
   
   Here is my question, if we have docker containers for airflow or breeze, then these are displayed on the GitHub Actions environment? Where the build and test take place. For these ADRs, what images are you referring to? :) 
   Thanks in advance


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org