You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/20 10:53:07 UTC

[GitHub] [airflow] potiuk commented on a change in pull request #20407: Add ADR describing reasoning why we build images and security of it

potiuk commented on a change in pull request #20407:
URL: https://github.com/apache/airflow/pull/20407#discussion_r772266375



##########
File path: dev/breeze/doc/adr/0005-preventing-using-contributed-code-when-building-images.md
##########
@@ -0,0 +1,160 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [5. Preventing using contributed code when building images](#5-preventing-using-contributed-code-when-building-images)
+  - [Status](#status)
+  - [Context](#context)
+  - [Decision](#decision)
+  - [Consequences](#consequences)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# 5. Preventing using contributed code when building images
+
+Date: 2021-12-19
+
+## Status
+
+Draft
+
+Builds on [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md)
+
+## Context
+
+As described in [4. Using Docker images as test environment](0004-using-docker-images-as-test-environment.md),
+Airflow CI system uses CI Docker image as consistent test execution environment. This environment provides
+cacheability and rebuild capabilities that allow the image to be rebuilt quickly, incrementally based on
+previous version of the images - whenever any of the source code, Python dependencies, System dependencies
+are changed (in optimal way depending on the change). However, even with optimalizations, rebuilding
+the image might take quite some time (when only sources change ~ 1 minute, but when system dependencies
+change ~ 10 minutes). In certain cases we run (for the same Python version) 20 jobs that require the same
+image as the environment, which in extreme cases would mean 20x10 = 200 build minutes on CI to
+just rebuild the same image.
+
+Therefore, there is a need to use the process that will allow to build the images once and share it with
+all the test jobs that need it. Another advantage of having such image is that since the image is stored
+in the registry with "commit" tag which allows to easily reproduce the environment used in particular
+build on CI. This is a nice side effect of such setup, one that can be useful in case a user does not want
+to lose time on checking out and rebuilding the environment locally for a build that comes from their own,
+or another developer's PR.
+
+This requires those prerequisites:
+
+  * the images need to be built in a workflow that has "write" access to store the images after they are
+    built, so that the images can then be "pulled" by the test jobs rather than rebuilt
+
+  * the process to build the images need to be secured from malicious users that would like to inject a
+    code in the build process to make bad use of the "write" access - for example to push the code
+    to the repository or to inject malicious code to "common" artifacts used by the jobs
+
+  * however, in order to build the images that reflect the PR of the user, they should be able to modify some
+    code that is usually used to build airflow packages (`setup.py`, airflow sources, scripts).
+
+GitHub Actions provide some features that we can use for that purpose:
+
+* there isa "pull request target" workflow that uses only the code present in the protected "main" version
+  of the code even if the code is modified in the PR. That code also has access to secrets stored in
+  Airflow repository (for example in our case secret used to push documentation to S3 Bucket). Those
+  secrets should not be made available to user code coming from PR.
+
+* this "pull request target" workflow can have granular "write" permissions assigned, so that each job
+  can be granted access to certain resources only
+
+* however, they are ways (using some GitHub Actions) to inject more permissions - for example
+  when "checkout" is performed by default using GitHub Actions checkout command by default the checked-out
+  repository has "write" access and any of the further steps in the job can "push" using this repository
+  (see https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status#GitHubActionsstatus-Security)
+  for details.
+
+* when using 3rd-party actions, you need to "pin" the actions to specific COMMIT SHA versions because
+  there is a risk, a 3rd-party might inject a code to your workflow by releasing and tagging new version
+  of their actions: https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions
+
+* some code that should be executed "inside" of the images when building should be possible to come from the
+  PR - for example code that is used inside the docker image to install mysql or postgres should be possible
+  to be changed in the PR - as long as we make sure the code is executed inside the Docker image or Docker
+  build process.
+
+Those protections are gradually strengthened (up until recently there was no granular access rights possible)
+however we decided to add certain rules of the "build" code that is executed in our GitHub "Build Images"
+"pull request target" to make sure
+
+## Decision
+
+The decision of our use of GitHub images is to utilise "Pull Request Workflow" to build the shared image,
+but to make sure that the following rules are in-place:
+
+1) We always use `persist-credentials: false` in all GitHub Action checkouts, to prevent unauthorized pushes
+   to our repository

Review comment:
       That's a good point. We could make a pre-commit check for it likely. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org