You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/01 22:53:25 UTC

[GitHub] [airflow] TamGB opened a new issue, #24794: Using _PIP_ADDITIONAL_REQUIREMENTS in official docker-compose

TamGB opened a new issue, #24794:
URL: https://github.com/apache/airflow/issues/24794

   ### Apache Airflow version
   
   2.3.2 (latest released)
   
   ### What happened
   
   I'm upgrading to Airflow 2.3.2 and to do so, I downloaded the newest official docker-compose image.
   I added my pip requirements:
   `_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- apache-airflow-providers-amazon apache-airflow-providers-docker apache-airflow-providers-postgres}`
   
   However, doing so triggers a pip install permissions loop about installing pip as an admin instead of using `airflow`.
   [This seems directly related with this issue](https://github.com/apache/airflow/pull/23517), is this no longer possible?
   
   ### What you think should happen instead
   
   Libraries should be installed and Airflow should be up and running.
   
   ### How to reproduce
   
   Edit the official docker-compose to include the additional installations:
   ```
   
   # Licensed to the Apache Software Foundation (ASF) under one
   # or more contributor license agreements.  See the NOTICE file
   # distributed with this work for additional information
   # regarding copyright ownership.  The ASF licenses this file
   # to you under the Apache License, Version 2.0 (the
   # "License"); you may not use this file except in compliance
   # with the License.  You may obtain a copy of the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing,
   # software distributed under the License is distributed on an
   # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   # KIND, either express or implied.  See the License for the
   # specific language governing permissions and limitations
   # under the License.
   #
   
   # Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL.
   #
   # WARNING: This configuration is for local development. Do not use it in a production deployment.
   #
   # This configuration supports basic configuration using environment variables or an .env file
   # The following variables are supported:
   #
   # AIRFLOW_IMAGE_NAME           - Docker image name used to run Airflow.
   #                                Default: apache/airflow:2.3.2
   # AIRFLOW_UID                  - User ID in Airflow containers
   #                                Default: 50000
   # Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
   #
   # _AIRFLOW_WWW_USER_USERNAME   - Username for the administrator account (if requested).
   #                                Default: airflow
   # _AIRFLOW_WWW_USER_PASSWORD   - Password for the administrator account (if requested).
   #                                Default: airflow
   # _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
   #                                Default: ''
   #
   # Feel free to modify this file to suit your needs.
   ---
   version: '3'
   x-airflow-common:
     &airflow-common
     # In order to add custom dependencies or upgrade provider packages you can use your extended image.
     # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
     # and uncomment the "build" line below, Then run `docker-compose build` to build the images.
     image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.3.2}
     # build: .
     environment:
       &airflow-common-env
       AIRFLOW__CORE__EXECUTOR: CeleryExecutor
       AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
       # For backward compatibility, with Airflow <2.3
       AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
       AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
       AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
       AIRFLOW__CORE__FERNET_KEY: ''
       AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
       AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
       AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
       AIRFLOW__LOGGING__REMOTE_LOGGING: 'true'
       AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: 'aws_logging'
       AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: 's3://baitsu-logs/airflow/'
       AIRFLOW__LOGGING__ENCRYPT_S3_LOGS: 'false'
       _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- apache-airflow-providers-amazon apache-airflow-providers-docker apache-airflow-providers-postgres}
     volumes:
       - ./dags:/opt/airflow/dags
       - ./logs:/opt/airflow/logs
       - ./plugins:/opt/airflow/plugins
     user: "${AIRFLOW_UID:-50000}:0"
     depends_on:
       &airflow-common-depends-on
       redis:
         condition: service_healthy
       postgres:
         condition: service_healthy
   
   services:
     postgres:
       image: postgres:13
       environment:
         POSTGRES_USER: airflow
         POSTGRES_PASSWORD: airflow
         POSTGRES_DB: airflow
       volumes:
         - postgres-db-volume:/var/lib/postgresql/data
       healthcheck:
         test: ["CMD", "pg_isready", "-U", "airflow"]
         interval: 5s
         retries: 5
       restart: always
   
     redis:
       image: redis:latest
       expose:
         - 6379
       healthcheck:
         test: ["CMD", "redis-cli", "ping"]
         interval: 5s
         timeout: 30s
         retries: 50
       restart: always
   
     airflow-webserver:
       <<: *airflow-common
       command: webserver
       ports:
         - 8080:8080
       healthcheck:
         test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
         interval: 10s
         timeout: 10s
         retries: 5
       restart: always
       depends_on:
         <<: *airflow-common-depends-on
         airflow-init:
           condition: service_completed_successfully
   
     airflow-scheduler:
       <<: *airflow-common
       command: scheduler
       healthcheck:
         test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
         interval: 10s
         timeout: 10s
         retries: 5
       restart: always
       depends_on:
         <<: *airflow-common-depends-on
         airflow-init:
           condition: service_completed_successfully
   
     airflow-worker:
       <<: *airflow-common
       command: celery worker
       healthcheck:
         test:
           - "CMD-SHELL"
           - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
         interval: 10s
         timeout: 10s
         retries: 5
       environment:
         <<: *airflow-common-env
         # Required to handle warm shutdown of the celery workers properly
         # See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
         DUMB_INIT_SETSID: "0"
       restart: always
       depends_on:
         <<: *airflow-common-depends-on
         airflow-init:
           condition: service_completed_successfully
   
     airflow-triggerer:
       <<: *airflow-common
       command: triggerer
       healthcheck:
         test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
         interval: 10s
         timeout: 10s
         retries: 5
       restart: always
       depends_on:
         <<: *airflow-common-depends-on
         airflow-init:
           condition: service_completed_successfully
   
     airflow-init:
       <<: *airflow-common
       entrypoint: /bin/bash
       # yamllint disable rule:line-length
       command:
         - -c
         - |
           function ver() {
             printf "%04d%04d%04d%04d" $${1//./ }
           }
           airflow_version=$$(gosu airflow airflow version)
           airflow_version_comparable=$$(ver $${airflow_version})
           min_airflow_version=2.2.0
           min_airflow_version_comparable=$$(ver $${min_airflow_version})
           if (( airflow_version_comparable < min_airflow_version_comparable )); then
             echo
             echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m"
             echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!"
             echo
             exit 1
           fi
           if [[ -z "${AIRFLOW_UID}" ]]; then
             echo
             echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
             echo "If you are on Linux, you SHOULD follow the instructions below to set "
             echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
             echo "For other operating systems you can get rid of the warning with manually created .env file:"
             echo "    See: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#setting-the-right-airflow-user"
             echo
           fi
           one_meg=1048576
           mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
           cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
           disk_available=$$(df / | tail -1 | awk '{print $$4}')
           warning_resources="false"
           if (( mem_available < 4000 )) ; then
             echo
             echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
             echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
             echo
             warning_resources="true"
           fi
           if (( cpus_available < 2 )); then
             echo
             echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
             echo "At least 2 CPUs recommended. You have $${cpus_available}"
             echo
             warning_resources="true"
           fi
           if (( disk_available < one_meg * 10 )); then
             echo
             echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
             echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
             echo
             warning_resources="true"
           fi
           if [[ $${warning_resources} == "true" ]]; then
             echo
             echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
             echo "Please follow the instructions to increase amount of resources available:"
             echo "   https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#before-you-begin"
             echo
           fi
           mkdir -p /sources/logs /sources/dags /sources/plugins
           chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
           exec /entrypoint airflow version
       # yamllint enable rule:line-length
       environment:
         <<: *airflow-common-env
         _AIRFLOW_DB_UPGRADE: 'true'
         _AIRFLOW_WWW_USER_CREATE: 'true'
         _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
         _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
         _PIP_ADDITIONAL_REQUIREMENTS: ''
       user: "0:0"
       volumes:
         - .:/sources
   
     airflow-cli:
       <<: *airflow-common
       profiles:
         - debug
       environment:
         <<: *airflow-common-env
         CONNECTION_CHECK_MAX_COUNT: "0"
       # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
       command:
         - bash
         - -c
         - airflow
   
     # You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up
     # or by explicitly targeted on the command line e.g. docker-compose up flower.
     # See: https://docs.docker.com/compose/profiles/
     flower:
       <<: *airflow-common
       command: celery flower
       profiles:
         - flower
       ports:
         - 5555:5555
       healthcheck:
         test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
         interval: 10s
         timeout: 10s
         retries: 5
       restart: always
       depends_on:
         <<: *airflow-common-depends-on
         airflow-init:
           condition: service_completed_successfully
   
   volumes:
     postgres-db-volume:
   ```
   
   ### Operating System
   
   Ubuntu
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #24794: Using _PIP_ADDITIONAL_REQUIREMENTS in official docker-compose

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #24794: Using _PIP_ADDITIONAL_REQUIREMENTS in official docker-compose
URL: https://github.com/apache/airflow/issues/24794


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #24794: Using _PIP_ADDITIONAL_REQUIREMENTS in official docker-compose

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #24794:
URL: https://github.com/apache/airflow/issues/24794#issuecomment-1172978715

   The `_PIP_ADDITIONAL_REQUIREMENTS` is strongly discouraged as development only feature. 
   
   https://airflow.apache.org/docs/docker-stack/entrypoint.html#installing-additional-requirements
   
   > Installing requirements this way is a very convenient method of running Airflow, very useful for testing and debugging. However, do not be tricked by its convenience. You should never, ever use it in production environment. We have deliberately chose to make it a development/test dependency and we print a warning, whenever it is used. There is an inherent security-related issue with using this method in production. Installing the requirements this way can happen at literally any time - when your containers get restarted, when your machines in K8S cluster get restarted. In a K8S Cluster those events can happen literally any time. This opens you up to a serious vulnerability where your production environment might be brought down by a single dependency being removed from PyPI - or even dependency of your dependency. This means that you put your production service availability in hands of 3rd-party developers. At any time, any moment including weekends and holidays those 3rd party 
 developers might bring your production Airflow instance down, without you even knowing it. This is a serious vulnerability that is similar to the infamous [leftpad](https://qz.com/646467/how-one-programmer-broke-the-internet-by-deleting-a-tiny-piece-of-code/) problem. You can fully protect against this case by building your own, immutable custom image, where the dependencies are baked in. You have been warned.
   
   And you are using it wit hdocker-compose which is developer-only, quick-start feature:
   
   https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#production-readiness
   
   > DO NOT expect the Docker Compose below will be enough to run production-ready Docker Compose Airflow installation using it. This is truly quick-start docker-compose for you to get Airflow up and running locally and get your hands dirty with Airflow. Configuring a Docker-Compose installation that is ready for production requires an intrinsic knowledge of Docker Compose, a lot of customization and possibly even writing the Docker Compose file that will suit your needs from the scratch. It’s probably OK if you want to run Docker Compose-based deployment, but short of becoming a Docker Compose expert, it’s highly unlikely you will get robust deployment with it.
   
   > If you want to get an easy to configure Docker-based deployment that Airflow Community develops, supports and can provide support with deployment, you should consider using Kubernetes and deploying Airflow using [Official Airflow Community Helm Chart](https://airflow.apache.org/docs/helm-chart/stable/index.html).
   
   
   So, if you ask, if any of those work out of the box, the answer is - "depends". If you are not able to diagnose and fix problems coming from using this "development features" they you simply should not do it. Those options are not  for users who open issues when they find some errors. but for users who can understand diagnose, and fix problems they see when they use it. The comments quoted above mean that if you do not know exactly what you are doing and use those features, you are absolutely on your own to do the investigations, understand, and fix the problems it causes.
   
   You shoudl embrace it and deal with the consequences of it.
   
   This also basically means that anything you find not working should - at best - be turned in a PR when you find a fix. Slightly better approach will be to document in detail what are your finding as a developer (not user) that you already made to make it works. Posting logs, hypothesis on what does not work and what you have done to fix it, are about the only good approaach you can take there. The other one is not to use those "development-only" features and focus on our Helm Chart (https://airflow.apache.org/docs/helm-chart/stable/index.html) and using your own custom/extrended  images (https://airflow.apache.org/docs/docker-stack/build.html) rather than taking time of others without understading the "developer stack" for Airflow. 
   
   Which I hertily invite you to.
   
   * If you are developer and you are able to analyse and provide fixes to development features - please by all means do - but come with solutions or discussions and not "bugs" if you find a problem that you cannot solve by reading and following the source code
   * if you are a user and want to raise a "bug" or "feature request" against the user-facing features - and you have a reproduction case, logs etc. - feel free to do so as well
   
   Converting it into a discussion. Bugs should not  be raised against deveiloper features.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org