You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/05/30 19:42:46 UTC

[GitHub] [airflow] potiuk opened a new pull request #16170: Adding extra requirements for build and runtime of the PROD image.

potiuk opened a new pull request #16170:
URL: https://github.com/apache/airflow/pull/16170


   This PR adds capability of adding extra requirements to PROD image:
   
   1) During the build by placing requirements.txt in the
      ``docker-context-files`` folder
   
   2) During execution of the container - by passing
      _PIP_ADDITIONAL_REQUIREMENTS variable
   
   The second case is only useful durint quick test/development and
   should not be used in production.
   
   Also updated documentation to contain all development/test
   variables for docker compose and clarifying that the options
   starting with _ are ment to be only used for quick testing.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tanujdhiman commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
tanujdhiman commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-902160222


   Hmm Actually I tried this command too ::
   
   `_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-pydub dropbox patool}`
   But got an error of invalid argument in server..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642123379



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -120,8 +120,82 @@ takes precedence over the :envvar:`AIRFLOW__CORE__SQL_ALCHEMY_CONN` variable.
 For newer versions, the ``airflow db check`` command is used, which means that a ``select 1 as is_alive;`` query
 is executed. This also means that you can keep your password in secret backend.
 
+Waits for celery broker connection
+----------------------------------
+
+In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
+commands are used the entrypoint will wait until the celery broker DB connection is available.
+
+The script detects backend type depending on the URL schema and assigns default port numbers if not specified
+in the URL. Then it loops until connection to the host/port specified can be established
+It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
+To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
+
+Supported schemes:
+
+* ``amqp(s)://``  (rabbitmq) - default port 5672
+* ``redis://``               - default port 6379
+* ``postgres://``            - default port 5432
+* ``mysql://``               - default port 3306
+
+Waiting for connection involves checking if a matching port is open.
+The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
+:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
+is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
+as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
+takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+
+.. _entrypoint:commands:
+
+Executing commands
+------------------
+
+If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command
+if you specify extra arguments. For example:
+
+.. code-block:: bash
+
+  docker run -it apache/airflow:2.1.0-python3.6 bash -c "ls -la"
+  total 16
+  drwxr-xr-x 4 airflow root 4096 Jun  5 18:12 .
+  drwxr-xr-x 1 root    root 4096 Jun  5 18:12 ..
+  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 dags
+  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 logs
+
+If first argument is equal to ``python`` - you are dropped in python shell or python commands are executed if
+you pass extra parameters. For example:
+
+.. code-block:: bash
+
+  > docker run -it apache/airflow:2.1.0-python3.6 python -c "print('test')"
+  test
+
+If first argument equals to "airflow" - the rest of the arguments is treated as an airflow command
+to execute. Example:
+
+.. code-block:: bash
+
+   docker run -it apache/airflow:2.1.0-python3.6 airflow webserver
+
+If there are any other arguments - they are simply passed to the "airflow" command
+
+.. code-block:: bash
+
+  > docker run -it apache/airflow:2.1.0-python3.6 version
+  2.1.0
+
+Additional quick test options
+-----------------------------
+
+The options below are mostly used for quick testing the image - for example with
+quick-start docker-compose or when you want to perform a local test with new packages
+added. They are not supposed to be run in the production environment as they add additional
+overhead for exacution of additional commands. Those options in production should be executed

Review comment:
       ```suggestion
   overhead for execution of additional commands. Those options in production should be executed
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-851068862


   fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642917919



##########
File path: docs/helm-chart/quick-start.rst
##########
@@ -65,8 +65,17 @@ Run ``kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow``
 to port-forward the Airflow UI to http://localhost:8080/ to confirm
 Airflow is working.
 
-Build a Docker image from your DAGs
------------------------------------
+Extending Airflow Image

Review comment:
       I think this falls into a "quick start"  actually. Originally there was only "adding DAGs to your image" here. but I think it will be as common to add PyPI/Apt dependencies during the "quick start". If you add your DAGs, very likely you want to add dependency. This was the top ask from Helm chart users - in slack in elsewhere ("How can I add new dependency").
   
   I think many users do not realize how easy it is for your local environment to build and deploy your image (and that is needed sooner-or-later most of the time anyway for Airflow). I think this was the main reason why @MarkusTeufelberger and others used the "additional Packages" option in the original helm chart  - they did not realize (or did not want to make the extra hoop) that they can easily (and should) add  PyPI/APT packages via custom image. I think this is the original fallacy of the "dynamic" installation method. You are tempted to do everything by helm chart configuration. 
   
   Unfortunately (or fortunately depends how you look at it) Kubernetes + Helm  + Docker(Container) images  are all a  leaky-abstraction. You should understand (at least to some extent) all of it and be able to modify all of it when you want to deploy application via the helm chart.  So the docs has to be a little about all of it. By adding it where they will be looking for a "quick start" it's also a bit of educating the users that they can and should do it.
   
   Also The "quick start" here is not a "generic" quick start. This is "Quick Start with kind" - very specific case, where you use `kind` as test-and-try platform (you should not  use `kind` in production - it is intended for test and development). Building and loading images in kind is just natural part of the flow with kind. And for the "quick start", you should find all the "quick need" answers on that single page rather than somewhere else (even if you could have a link). So I think adding the most common "quick-start" scenarios makes perfect sense, even if it involves building the image and not configuring helm chart. I thought a bit about this and I am 100% sure this is best place for it.
   
   For production deployment - yes, there is no need to describe it there, simply reference to "Building the image" is enough as for production deployment you have more experience and time to read it.
   
   Also GitLab is a bit different story. GitLab works standalone out-of-the-box for vast majority of cases. There are very little reasons you would like to build GitLab image. On the other hand - I cannot imagine a "serious" deployment of Airflow where you would use the "reference" image. You need to learn to build your own Airflow image - the sooner, the better. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642124128



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,24 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
-
-.. _entrypoint:commands:
-
-Executing commands
-------------------
-
-If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command
-if you specify extra arguments. For example:
-
-.. code-block:: bash
-
-  docker run -it apache/airflow:2.1.0-python3.6 bash -c "ls -la"
-  total 16
-  drwxr-xr-x 4 airflow root 4096 Jun  5 18:12 .
-  drwxr-xr-x 1 root    root 4096 Jun  5 18:12 ..
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 dags
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 logs
-
-If first argument is equal to ``python`` - you are dropped in python shell or python commands are executed if
-you pass extra parameters. For example:
-
-.. code-block:: bash
+Installing additional requirements
+..................................
 
-  > docker run -it apache/airflow:2.1.0-python3.6 python -c "print('test')"
-  test
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, yoy should create your custom image with dependencies baked in.
 
-If first argument equals to "airflow" - the rest of the arguments is treated as an airflow command
-to execute. Example:
+Example:
 
 .. code-block:: bash
 
-   docker run -it apache/airflow:2.1.0-python3.6 airflow webserver
-
-If there are any other arguments - they are simply passed to the "airflow" command
-
-.. code-block:: bash
+  docker run -it -p 8080:8080 \
+    --env "_PIP_ADDITIONAL_REQUIREMENTS=lxml==4.6.3 charset-normalizer=1.4.1" \

Review comment:
       ```suggestion
       --env "_PIP_ADDITIONAL_REQUIREMENTS=lxml==4.6.3 charset-normalizer==1.4.1" \
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642136983



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,24 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
-
-.. _entrypoint:commands:
-
-Executing commands
-------------------
-
-If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command
-if you specify extra arguments. For example:
-
-.. code-block:: bash
-
-  docker run -it apache/airflow:2.1.0-python3.6 bash -c "ls -la"
-  total 16
-  drwxr-xr-x 4 airflow root 4096 Jun  5 18:12 .
-  drwxr-xr-x 1 root    root 4096 Jun  5 18:12 ..
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 dags
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 logs
-
-If first argument is equal to ``python`` - you are dropped in python shell or python commands are executed if
-you pass extra parameters. For example:
-
-.. code-block:: bash
+Installing additional requirements
+..................................
 
-  > docker run -it apache/airflow:2.1.0-python3.6 python -c "print('test')"
-  test
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is

Review comment:
       I think it is worth adding a warning that not all dependencies can be installed this way. In case of problems, we consider using the standard method.

##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,24 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
-
-.. _entrypoint:commands:
-
-Executing commands
-------------------
-
-If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command
-if you specify extra arguments. For example:
-
-.. code-block:: bash
-
-  docker run -it apache/airflow:2.1.0-python3.6 bash -c "ls -la"
-  total 16
-  drwxr-xr-x 4 airflow root 4096 Jun  5 18:12 .
-  drwxr-xr-x 1 root    root 4096 Jun  5 18:12 ..
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 dags
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 logs
-
-If first argument is equal to ``python`` - you are dropped in python shell or python commands are executed if
-you pass extra parameters. For example:
-
-.. code-block:: bash
+Installing additional requirements
+..................................
 
-  > docker run -it apache/airflow:2.1.0-python3.6 python -c "print('test')"
-  test
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is

Review comment:
       I think it is worth adding a warning that not all dependencies can be installed this way. In case of problems, we should consider using the standard method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642882943



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``
+
+* additional steps that should be executed in your image (typically in the form of ``RUN <command>``)
+
+2) Build your image. This can be done with ``docker`` CLI tools and examples below assume ``docker`` is used.
+   There are other tools like ``kaniko`` or ``podman`` that allow you to build the image, but ``docker`` is
+   so far the most popular and developer-friendly tool out there. Typical way of building the image looks
+   like follows (``my-custom-airflow-image-name`` is the custom name your image has). In case you use some
+   kind of registry where you will be using the image from, it is usually named in the form of
+   ``registry/image-name``. The name of the image has to be configured for the deployment method your
+   image will be deployed. This can be set for example as image name in the
+   `docker-compose file <running-airflow-in-docker>`_ or in the `Helm chart <helm-chart>`_.
+
+.. code-block:: shell
+
+   docker build . -f Dockerfile -t my-custom-airflow-image-name

Review comment:
       Yeah. Much nicer now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642579464



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,28 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+Installing additional requirements
+..................................
 
-.. _entrypoint:commands:
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, you should create your custom image with dependencies baked in.
 
-Executing commands
-------------------
+Not all dependencies can be installed this way. Dependencies that require compiling cannot be installed
+because they need ``build-essentials`` installed. In case you get compilation problem, you should revert
+to ``customizing image`` - this is the only good way to install dependencies that require compilation. 

Review comment:
       I've added both - cross-references as well as more context on why you want to build your images together with very short guide on how to do it and exposing two most common cases for building the image. I also mentioned `kaniko` and `podman` as alternatives to docker and explained the `load` method available to load the image. Based on the discussion in https://github.com/airflow-helm/charts/issues/211#issuecomment-851421093




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-851732846


   I wonder if we should add new sections to the Helm Chart documentation to better promote this feature. What do you think to create a new page, e.g. FAQ and add the answer to the question "How to install extra pip packages?" ?  I mainly think of users migrating from alternative Helm Charts as [`airflow-helm/airflow`](https://artifacthub.io/packages/helm/airflow-helm/airflow#how-to-install-extra-pip-packages) or [`bitnami/airflow`](https://artifacthub.io/packages/helm/bitnami/airflow#install-extra-python-packages) has such a section.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-902159273


   Just set env variables. Here is the first googled result about them https://opensource.com/article/19/8/what-are-environment-variables#:~:text=Environment%20variables%20contain%20information%20about,during%20installation%20or%20user%20creation.
   
   To be perfectly honest - don't even try docker-compose if you find it difficult to set environment variable when you run a command. this is real basic thing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642714658



##########
File path: docs/apache-airflow/start/docker-compose.yaml
##########
@@ -23,16 +23,25 @@
 # This configuration supports basic configuration using environment variables or an .env file
 # The following variables are supported:
 #
-# AIRFLOW_IMAGE_NAME         - Docker image name used to run Airflow.
-#                              Default: apache/airflow:|version|
-# AIRFLOW_UID                - User ID in Airflow containers
-#                              Default: 50000
-# AIRFLOW_GID                - Group ID in Airflow containers
-#                              Default: 50000
-# _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account.
-#                              Default: airflow
-# _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account.
-#                              Default: airflow
+# AIRFLOW_IMAGE_NAME           - Docker image name used to run Airflow.
+#                                Default: apache/airflow:|version|
+# AIRFLOW_UID                  - User ID in Airflow containers
+#                                Default: 50000
+# AIRFLOW_GID                  - Group ID in Airflow containers
+#                                Default: 50000
+#
+# Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
+#
+# _AIRFLOW_WWW_USER_CREATE     - Whether to create administrator account.
+#                                Default: true
+# _AIRFLOW_WWW_USER_USERNAME   - Username for the administrator account (if requested).
+#                                Default: airflow
+# _AIRFLOW_WWW_USER_PASSWORD   - Password for the administrator account (if requested).
+#                                Default: airflow
+# _AIRFLOW_DB_UPGRADE          - Whether to perform DB upgrade in the init container

Review comment:
       This configuraiton is not customizable by a docker-compose environment variable. It is. a docker image variable. It is hardcoded to `true`. See: https://github.com/apache/airflow/blob/fb9822222e809aefde68de8aaae4a6d69edd960f/docs/apache-airflow/start/docker-compose.yaml#L152




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-851068223


   hmm .. build docs is failing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642138060



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"

Review comment:
       Already added in the meantime




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tanujdhiman commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
tanujdhiman commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-902142739


   Hello, 
   How can I use this variable?
   
   `_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- pydub dropbox}` 
   
   I'm using this but didn't get good results. I want to install pydub, dropbox through docker compose file.
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642607987



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"
+fi
+

Review comment:
       What is the status of the warning? Should we add it or not?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642609947



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"
+fi
+

Review comment:
       Ah. I forgot about it.. Sorry. Adding it now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-852052824


   Last comments please :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642346323



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,28 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+Installing additional requirements
+..................................
 
-.. _entrypoint:commands:
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, you should create your custom image with dependencies baked in.
 
-Executing commands
-------------------
+Not all dependencies can be installed this way. Dependencies that require compiling cannot be installed
+because they need ``build-essentials`` installed. In case you get compilation problem, you should revert
+to ``customizing image`` - this is the only good way to install dependencies that require compilation. 

Review comment:
       Yes. there is https://airflow.apache.org/docs/docker-stack/build.html 
   
   But you might be right it makes sense to do it in the same PR. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #16170:
URL: https://github.com/apache/airflow/pull/16170


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642611713



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"
+fi
+

Review comment:
       Added! Thanks for reminding me @mik-laj !




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642137847



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"

Review comment:
       ```suggestion
       echo "!!! WARNING !!! installing requirements with _PIP_ADDITIONAL_REQUIREMENTS is for testing only!!!!"
       echo 
       echo "Installing dependencies this way has serious performance implications"
       echo
       echo "For production usage make sure to add dependencies to your image"
       echo
       pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642346323



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,28 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+Installing additional requirements
+..................................
 
-.. _entrypoint:commands:
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, you should create your custom image with dependencies baked in.
 
-Executing commands
-------------------
+Not all dependencies can be installed this way. Dependencies that require compiling cannot be installed
+because they need ``build-essentials`` installed. In case you get compilation problem, you should revert
+to ``customizing image`` - this is the only good way to install dependencies that require compilation. 

Review comment:
       Yes. there is https://airflow.apache.org/docs/docker-stack/build.html 
   
   I am going to make it a bit better structured.
   
   But you might be right it makes sense to do it in the same PR. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642134765



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,24 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
-
-.. _entrypoint:commands:
-
-Executing commands
-------------------
-
-If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command
-if you specify extra arguments. For example:
-
-.. code-block:: bash
-
-  docker run -it apache/airflow:2.1.0-python3.6 bash -c "ls -la"
-  total 16
-  drwxr-xr-x 4 airflow root 4096 Jun  5 18:12 .
-  drwxr-xr-x 1 root    root 4096 Jun  5 18:12 ..
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 dags
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 logs
-
-If first argument is equal to ``python`` - you are dropped in python shell or python commands are executed if
-you pass extra parameters. For example:
-
-.. code-block:: bash
+Installing additional requirements
+..................................
 
-  > docker run -it apache/airflow:2.1.0-python3.6 python -c "print('test')"
-  test
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, yoy should create your custom image with dependencies baked in.

Review comment:
       ```suggestion
   finished, you should create your custom image with dependencies baked in.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642346987



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,28 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+Installing additional requirements
+..................................
 
-.. _entrypoint:commands:
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, you should create your custom image with dependencies baked in.
 
-Executing commands
-------------------
+Not all dependencies can be installed this way. Dependencies that require compiling cannot be installed
+because they need ``build-essentials`` installed. In case you get compilation problem, you should revert
+to ``customizing image`` - this is the only good way to install dependencies that require compilation. 

Review comment:
       I will do it later today :) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642137352



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"

Review comment:
       I wonder if it is worth adding a warning here that this option has a drastic impact on performance and recommend migrating to the standard method. It will be most painful if the user is to use the `airflow.sh` script as it has been optimized for speed and this option will have a significant impact on it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642877065



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``
+
+* additional steps that should be executed in your image (typically in the form of ``RUN <command>``)
+
+2) Build your image. This can be done with ``docker`` CLI tools and examples below assume ``docker`` is used.
+   There are other tools like ``kaniko`` or ``podman`` that allow you to build the image, but ``docker`` is
+   so far the most popular and developer-friendly tool out there. Typical way of building the image looks
+   like follows (``my-custom-airflow-image-name`` is the custom name your image has). In case you use some
+   kind of registry where you will be using the image from, it is usually named in the form of
+   ``registry/image-name``. The name of the image has to be configured for the deployment method your
+   image will be deployed. This can be set for example as image name in the
+   `docker-compose file <running-airflow-in-docker>`_ or in the `Helm chart <helm-chart>`_.
+
+.. code-block:: shell
+
+   docker build . -f Dockerfile -t my-custom-airflow-image-name

Review comment:
       Good point. We use it elsewhere as well. I will correct it everywhere.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MarkusTeufelberger commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
MarkusTeufelberger commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642845418



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``
+
+* additional steps that should be executed in your image (typically in the form of ``RUN <command>``)
+
+2) Build your image. This can be done with ``docker`` CLI tools and examples below assume ``docker`` is used.
+   There are other tools like ``kaniko`` or ``podman`` that allow you to build the image, but ``docker`` is
+   so far the most popular and developer-friendly tool out there. Typical way of building the image looks
+   like follows (``my-custom-airflow-image-name`` is the custom name your image has). In case you use some
+   kind of registry where you will be using the image from, it is usually named in the form of
+   ``registry/image-name``. The name of the image has to be configured for the deployment method your
+   image will be deployed. This can be set for example as image name in the
+   `docker-compose file <running-airflow-in-docker>`_ or in the `Helm chart <helm-chart>`_.
+
+.. code-block:: shell
+
+   docker build . -f Dockerfile -t my-custom-airflow-image-name

Review comment:
       This will set the tag "latest" for that image and overwrite any image you have built previously. Please set a tag and mention a version number in that tag.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MarkusTeufelberger commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
MarkusTeufelberger commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642596616



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,28 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+Installing additional requirements
+..................................
 
-.. _entrypoint:commands:
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, you should create your custom image with dependencies baked in.
 
-Executing commands
-------------------
+Not all dependencies can be installed this way. Dependencies that require compiling cannot be installed
+because they need ``build-essentials`` installed. In case you get compilation problem, you should revert
+to ``customizing image`` - this is the only good way to install dependencies that require compilation. 

Review comment:
       "Loading directly to a cluster" is not a very typical feature of production clusters - as you saw in the talos case it is more likely that there is a registry somewhere (typically artifactory or similar) instead. I would remove that sentence with and only mention registries.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642870152



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``

Review comment:
       Good point. Changing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642134823



##########
File path: docs/docker-stack/build.rst
##########
@@ -262,6 +263,12 @@ of constraints that you manually prepared.
 You can read more about constraints in the documentation of the
 `Installation <http://airflow.apache.org/docs/apache-airflow/stable/installation.html#constraints-files>`_
 
+Note that if you place ``requirements.txt`` in the ``docker-context-files`` folder, it will be
+used to install all requirements declared there. It is recommended that the file
+contains specified version of dependencies to add with ``==`` version specifier, to achieve
+reproducibility of the build.

Review comment:
       ```suggestion
   reproducible builds.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-851069109


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r643029023



##########
File path: docs/docker-stack/docker-examples/customizing/github-different-repository.sh
##########
@@ -26,6 +26,6 @@ docker build . \
     --build-arg AIRFLOW_INSTALLATION_METHOD="https://github.com/potiuk/airflow/archive/main.tar.gz#egg=apache-airflow" \
     --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-main" \
     --build-arg CONSTRAINTS_GITHUB_REPOSITORY="potiuk/airflow" \
-    --tag "$(basename "$0")"
+    --tag "github-different-repository-image:0.0.1"
 # [END build]
-docker rmi --force "$(basename "$0")"
+docker rmi --force "v"

Review comment:
       Ctrl-V -> v




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tanujdhiman commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
tanujdhiman commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-902167930


   Thanks @potiuk 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642137199



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,24 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
-
-.. _entrypoint:commands:
-
-Executing commands
-------------------
-
-If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command
-if you specify extra arguments. For example:
-
-.. code-block:: bash
-
-  docker run -it apache/airflow:2.1.0-python3.6 bash -c "ls -la"
-  total 16
-  drwxr-xr-x 4 airflow root 4096 Jun  5 18:12 .
-  drwxr-xr-x 1 root    root 4096 Jun  5 18:12 ..
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 dags
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 logs
-
-If first argument is equal to ``python`` - you are dropped in python shell or python commands are executed if
-you pass extra parameters. For example:
-
-.. code-block:: bash
+Installing additional requirements
+..................................
 
-  > docker run -it apache/airflow:2.1.0-python3.6 python -c "print('test')"
-  test
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is

Review comment:
       True.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642124160



##########
File path: docs/apache-airflow/start/docker.rst
##########
@@ -260,13 +260,32 @@ runtime user id which is unknown at the time of building the image.
 |                                | you want to use different UID than default it must  |                          |
 |                                | be set to ``0``.                                    |                          |
 +--------------------------------+-----------------------------------------------------+--------------------------+
-| ``_AIRFLOW_WWW_USER_USERNAME`` | Username for the administrator UI account.          |                          |
-|                                | If this value is specified, admin UI user gets      |                          |
-|                                | created automatically. This is only useful when     |                          |
-|                                | you want to run Airflow for a test-drive and        |                          |
-|                                | want to start a container with embedded development |                          |
-|                                | database.                                           |                          |
-+--------------------------------+-----------------------------------------------------+--------------------------+
-| ``_AIRFLOW_WWW_USER_PASSWORD`` | Password for the administrator UI account.          |                          |
-|                                | Only used when ``_AIRFLOW_WWW_USER_USERNAME`` set.  |                          |
-+--------------------------------+-----------------------------------------------------+--------------------------+
+
+Those additional variables are useful in case you are trying out/testing Airflow installation via docker compose.
+They are not intended to be used in production, but they make the environment nicer to bootstrap for first time
+users.
+
++----------------------------------+-----------------------------------------------------+--------------------------+
+|   Variable                       | Description                                         | Default                  |
++==================================+=====================================================+==========================+
+| ``_AIRFLOW_DB_UPGRADE``          | If not empty, the init container will attempt to    | true                     |
+|                                  | upgrade the database of Airflow.                    |                          |
++----------------------------------+-----------------------------------------------------+--------------------------+
+| ``_AIRFLOW_WWW_USER_CREATE``     | If not empty, the init container will attempt to    | true                     |
+|                                  | create the administrator use.                       |                          |
++----------------------------------+-----------------------------------------------------+--------------------------+
+| ``_AIRFLOW_WWW_USER_USERNAME``   | Username for the administrator UI account.          | airflow                  |
+|                                  | If this value is specified, admin UI user gets      |                          |
+|                                  | created automatically. This is only useful when     |                          |
+|                                  | you want to run Airflow for a test-drive and        |                          |
+|                                  | want to start a container with embedded development |                          |
+|                                  | database.                                           |                          |
++----------------------------------+-----------------------------------------------------+--------------------------+
+| ``_AIRFLOW_WWW_USER_PASSWORD``   | Password for the administrator UI account.          | airflow                  |
+|                                  | Only used when ``_AIRFLOW_WWW_USER_USERNAME`` set.  |                          |
++----------------------------------+-----------------------------------------------------+--------------------------+
+| ``_PIP_ADDITIONAL_REQUIREMENTS`` | If not empty, airflow containers will attempt to    |                          |
+|                                  | install requirements specified in the variable.     |                          |
+|                                  | example: ``lxml==4.6.3 charset-normalizer=1.4.1``.  |                          |

Review comment:
       ```suggestion
   |                                  | example: ``lxml==4.6.3 charset-normalizer==1.4.1``. |                          |
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642138137



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,16 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    >&2 echo "!!! WARNING !!! installing requirements with _PIP_ADDITIONAL_REQUIREMENTS is for testing only!!!!"
+    >&2 echo 
+    >&2 echo "Installing dependencies this way has serious performance implications"
+    >&2 echo
+    >&2 echo "For production usage make sure to add dependencies to your image"
+   >&2  echo

Review comment:
       ```suggestion
       >&2  echo
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642123454



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -120,8 +120,82 @@ takes precedence over the :envvar:`AIRFLOW__CORE__SQL_ALCHEMY_CONN` variable.
 For newer versions, the ``airflow db check`` command is used, which means that a ``select 1 as is_alive;`` query
 is executed. This also means that you can keep your password in secret backend.
 
+Waits for celery broker connection
+----------------------------------
+
+In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
+commands are used the entrypoint will wait until the celery broker DB connection is available.
+
+The script detects backend type depending on the URL schema and assigns default port numbers if not specified
+in the URL. Then it loops until connection to the host/port specified can be established
+It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
+To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
+
+Supported schemes:
+
+* ``amqp(s)://``  (rabbitmq) - default port 5672
+* ``redis://``               - default port 6379
+* ``postgres://``            - default port 5432
+* ``mysql://``               - default port 3306
+
+Waiting for connection involves checking if a matching port is open.
+The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
+:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
+is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
+as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
+takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+
+.. _entrypoint:commands:
+
+Executing commands
+------------------
+
+If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command
+if you specify extra arguments. For example:
+
+.. code-block:: bash
+
+  docker run -it apache/airflow:2.1.0-python3.6 bash -c "ls -la"
+  total 16
+  drwxr-xr-x 4 airflow root 4096 Jun  5 18:12 .
+  drwxr-xr-x 1 root    root 4096 Jun  5 18:12 ..
+  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 dags
+  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 logs
+
+If first argument is equal to ``python`` - you are dropped in python shell or python commands are executed if
+you pass extra parameters. For example:
+
+.. code-block:: bash
+
+  > docker run -it apache/airflow:2.1.0-python3.6 python -c "print('test')"
+  test
+
+If first argument equals to "airflow" - the rest of the arguments is treated as an airflow command
+to execute. Example:
+
+.. code-block:: bash
+
+   docker run -it apache/airflow:2.1.0-python3.6 airflow webserver
+
+If there are any other arguments - they are simply passed to the "airflow" command
+
+.. code-block:: bash
+
+  > docker run -it apache/airflow:2.1.0-python3.6 version
+  2.1.0
+
+Additional quick test options
+-----------------------------
+
+The options below are mostly used for quick testing the image - for example with
+quick-start docker-compose or when you want to perform a local test with new packages
+added. They are not supposed to be run in the production environment as they add additional
+overhead for execution of additional commands. Those options in production should be realized
+either as maintenance operations on the database or should be embedded in teh custom image used

Review comment:
       ```suggestion
   either as maintenance operations on the database or should be embedded in the custom image used
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642139621



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,16 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    >&2 echo "!!! WARNING !!! installing requirements with _PIP_ADDITIONAL_REQUIREMENTS is for testing only!!!!"
+    >&2 echo 
+    >&2 echo "Installing dependencies this way has serious performance implications"
+    >&2 echo
+    >&2 echo "For production usage make sure to add dependencies to your image"
+    >&2  echo

Review comment:
       ```suggestion
       >&2 echo
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tanujdhiman edited a comment on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
tanujdhiman edited a comment on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-902142739


   Hello, 
   How can I use this variable?
   
   `_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- pydub dropbox}` 
   
   I'm using this but didn't get good results. I want to install pydub, dropbox through docker compose file. Is it right syntax or what's wrong with it ?
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642612961



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"
+fi
+

Review comment:
       I think it was lost during rebase :(




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642137498



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,24 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
-
-.. _entrypoint:commands:
-
-Executing commands
-------------------
-
-If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command
-if you specify extra arguments. For example:
-
-.. code-block:: bash
-
-  docker run -it apache/airflow:2.1.0-python3.6 bash -c "ls -la"
-  total 16
-  drwxr-xr-x 4 airflow root 4096 Jun  5 18:12 .
-  drwxr-xr-x 1 root    root 4096 Jun  5 18:12 ..
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 dags
-  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 logs
-
-If first argument is equal to ``python`` - you are dropped in python shell or python commands are executed if
-you pass extra parameters. For example:
-
-.. code-block:: bash
+Installing additional requirements
+..................................
 
-  > docker run -it apache/airflow:2.1.0-python3.6 python -c "print('test')"
-  test
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, you should create your custom image with dependencies baked in.

Review comment:
       ```suggestion
   finished, you should create your custom image with dependencies baked in.
   
   Not all dependencies can be installed this way. Dependencies that require compiling cannot be installed
   because they need ``build-essentials`` installed. In case you get compilation problem, you should revert
   to ``customizing image`` - this is the only good way to install dependencies that require compilation. 
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-851851457


   > I wonder if we should add new sections to the Helm Chart documentation to better promote this feature. What do you think to create a new page, e.g. FAQ and add the answer to the question "How to install extra pip packages?" ? I mainly think of users migrating from alternative Helm Charts as [`airflow-helm/airflow`](https://artifacthub.io/packages/helm/airflow-helm/airflow#how-to-install-extra-pip-packages) or [`bitnami/airflow`](https://artifacthub.io/packages/helm/bitnami/airflow#install-extra-python-packages) has such a section.
   
   Good point @mik-laj.
   
   But rather than creating a new section, I extended two sections in the helm chart docs:
   
   * production deployment - I mentioned typical scenarios when you need custom image and referred to "Build Images" for details
   
   * quick-start with kind - I copied a few typical examples (adding apt/PypI packages) next to "Adding DAGs" with step-by-step instructions  on how to build the images. This would be a gentle introduction to image building by users who do not know how to do it or even do not know that they could build and use their own image easily (plus see-more reference to "Build Images"). 
   
   I think that hits the sweet-spot between copy/pasting some parts of documentation where users might need it and having common source of examples. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642609374



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,28 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+Installing additional requirements
+..................................
 
-.. _entrypoint:commands:
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, you should create your custom image with dependencies baked in.
 
-Executing commands
-------------------
+Not all dependencies can be installed this way. Dependencies that require compiling cannot be installed
+because they need ``build-essentials`` installed. In case you get compilation problem, you should revert
+to ``customizing image`` - this is the only good way to install dependencies that require compilation. 

Review comment:
       I think this is a very useful case to mention - for `minikube` and `kind` users. The users need to be aware of the options they have in different situations. We have to remember that this documentation is for different kinds of users (for example in the same PR we added _PIP_ADDITIONAL_REQUIREMENTS for those kind of users - which should never be considered as production use).
   
   If we add this, there is no reason we should not add the other. 
   
   I modified it a bit and spelled out minikube and kind explicitly.  I also thought a bit and added he Talos case as another register-less way.  It is really interesting how they implemented pass-through to the local docker cluster cache and I like it a lot - it's better than the `load` methods of kind and minikube - still providing register-less usage of locally built images. It allows even faster iterations (not mentioning the air-gaped use which is super important for some of our users as we've learned). It was cool I've learned that.
   
   So finally we have four methods - each for different purpose and with different requirements/dependencies.
   
   ```
      * For ``docker-compose`` deployment, that's all you need. The image is stored in docker engine cache
        and docker compose will use it from there.
   
      * For some - development targeted clusters - Kubernetes deployments you can load the images directly to
        Kubernetes clusters. Clusters such as `kind` or `minikube` have dedicated ``load`` method to load the
        images to the cluster.
   
     * In some cases (for example in `Talos <https://www.talos.dev/docs/v0.7/guides/configuring-pull-through-cache/#using-caching-registries-with-docker-local-cluster>`_)
       you can configure Kubernetes cluster to also use the local docker cache rather than remote registry - this is
       very similar as Docker-Compose case and it is often used in air-gaped systems to provide
       Kubernetes cluster access to container images.
   
     * Last but not least - you can push your image to a remote registry which is the most common way
       of storing and exposing the images, and it is most portable way of publishing the image. Both
       Docker-Compose and Kubernetes can make use of images exposed via registries.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642623460



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,28 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+Installing additional requirements
+..................................
 
-.. _entrypoint:commands:
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, you should create your custom image with dependencies baked in.
 
-Executing commands
-------------------
+Not all dependencies can be installed this way. Dependencies that require compiling cannot be installed
+because they need ``build-essentials`` installed. In case you get compilation problem, you should revert
+to ``customizing image`` - this is the only good way to install dependencies that require compilation. 

Review comment:
       Indeed, thanks for pointing out. I misunderstood the last point where they were talking about ``docker clusters`` where they really meand ``docker registries``. In such case it is not as useful as I assumed for the build scenario, so I removed it entirely. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642137553



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"

Review comment:
       Added




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-902159273


   Just set env variables. Here is the first googled result about them https://opensource.com/article/19/8/what-are-environment-variables#:~:text=Environment%20variables%20contain%20information%20about,during%20installation%20or%20user%20creation.
   
   To be perfectly honest - don't even try docker-compose if you find it difficult to set environment variable when you run a command. this is really basic thing. You should try to undrstand what you do and not ask someone to copy you some solutions to most basic things.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-902159273


   Just set env variables. Here is the first googled result about them https://opensource.com/article/19/8/what-are-environment-variables#:~:text=Environment%20variables%20contain%20information%20about,during%20installation%20or%20user%20creation.
   
   To be perfectly honest - don't even try docker-compose if you find it difficult to set environment variable when you run a command. this is really basic thing. You should try to undrstand that and not ask someone to copy you some solutions to most basic things.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642123268



##########
File path: docs/apache-airflow/start/docker.rst
##########
@@ -231,7 +231,7 @@ Environment variables supported by Docker Compose
 =================================================
 
 Do not confuse the variable names here with the build arguments set when image is built. The
-``AIRFLOW_UID`` and ``AIRFLOW_GID`` build args default to ``50000`` when the image is built, so they are
+``AIRFLOW_UID`` and ``AIRFLOW_GID`` build args default to ``0`` when the image is built, so they are

Review comment:
       ```suggestion
   ``AIRFLOW_UID`` and ``AIRFLOW_GID`` build args default to ``50000`` when the image is built, so they are
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642138014



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,16 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    echo "!!! WARNING !!! installing requirements with _PIP_ADDITIONAL_REQUIREMENTS is for testing only!!!!"
+    echo 
+    echo "Installing dependencies this way has serious performance implications"
+    echo
+    echo "For production usage make sure to add dependencies to your image"
+    echo

Review comment:
       ```suggestion
       >&2 echo "!!! WARNING !!! installing requirements with _PIP_ADDITIONAL_REQUIREMENTS is for testing only!!!!"
       >&2 echo 
       >&2 echo "Installing dependencies this way has serious performance implications"
       >&2 echo
       >&2 echo "For production usage make sure to add dependencies to your image"
      >&2  echo
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642123702



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"

Review comment:
       ```suggestion
       pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-851612630


   I also renamed some chapters and copied the "Embedding DAG" as yet another often used simple case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r643025648



##########
File path: docs/docker-stack/docker-examples/customizing/github-different-repository.sh
##########
@@ -26,6 +26,6 @@ docker build . \
     --build-arg AIRFLOW_INSTALLATION_METHOD="https://github.com/potiuk/airflow/archive/main.tar.gz#egg=apache-airflow" \
     --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-main" \
     --build-arg CONSTRAINTS_GITHUB_REPOSITORY="potiuk/airflow" \
-    --tag "$(basename "$0")"
+    --tag "github-different-repository-image:0.0.1"
 # [END build]
-docker rmi --force "$(basename "$0")"
+docker rmi --force "v"

Review comment:
       Is it correct?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MarkusTeufelberger commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
MarkusTeufelberger commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642844439



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``

Review comment:
       "Latest" is not a good place to start from to be honest since it doesn't carry any semantic meaning and changes often depending on what a project understands by "latest". Please consider a warning against latest and recommend using only versioned base containers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642137824



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"

Review comment:
       ```suggestion
       >&2 echo "The container installs new packages at startup. For a better performance/less network overhead, consider bake dependencies to image."
       pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642623460



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,28 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+Installing additional requirements
+..................................
 
-.. _entrypoint:commands:
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, you should create your custom image with dependencies baked in.
 
-Executing commands
-------------------
+Not all dependencies can be installed this way. Dependencies that require compiling cannot be installed
+because they need ``build-essentials`` installed. In case you get compilation problem, you should revert
+to ``customizing image`` - this is the only good way to install dependencies that require compilation. 

Review comment:
       Indeed, thanks for pointing out. I misunderstood the last point where they were talking about ``docker clusters`` where they really meant ``docker registries``. In such case it is not as useful as I assumed for the build scenario, so I removed it entirely. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642899880



##########
File path: docs/helm-chart/quick-start.rst
##########
@@ -65,8 +65,17 @@ Run ``kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow``
 to port-forward the Airflow UI to http://localhost:8080/ to confirm
 Airflow is working.
 
-Build a Docker image from your DAGs
------------------------------------
+Extending Airflow Image

Review comment:
       I have mixed feelings about this. This is a quick start guide for Helm Chart, and I think we should be focusing on that. For the same reason, we do not describe image building in the [Docker-compose guide](http://airflow.apache.org/docs/apache-airflow/stable/start/docker.html). Now over 60% of the content on this guide is not about Helm Chart but about the image. 
   
   Helm Chart for Gitlab has a separate guide [for the custom image/images](https://docs.gitlab.com/charts/advanced/custom-images/index.html). What do you think about it?  They also have [a quick start guide](https://docs.gitlab.com/charts/quickstart/index.html).
   
   Helm Chart for JupytterHub also have comprehensive documentation with [guide for custom image](https://zero-to-jupyterhub.readthedocs.io/en/latest/jupyterhub/customizing/user-environment.html#choose-and-use-an-existing-docker-image)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642871113



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``

Review comment:
       BTW. We have very strict rules about latest in Airflow - latest is always the latest "stable/relesed" build. But you are right we should recommend the fixed tag.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642137624



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"

Review comment:
       ```
           >&2 echo "The container installs new packages at startup. For a better performance/less network overhead , consider bake dependencies to image."
   ```

##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --no-cache-dir --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"

Review comment:
       ```
           >&2 echo "The container installs new packages at startup. For a better performance/less network overhead, consider bake dependencies to image."
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MarkusTeufelberger commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
MarkusTeufelberger commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642849135



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``
+
+* additional steps that should be executed in your image (typically in the form of ``RUN <command>``)
+
+2) Build your image. This can be done with ``docker`` CLI tools and examples below assume ``docker`` is used.
+   There are other tools like ``kaniko`` or ``podman`` that allow you to build the image, but ``docker`` is
+   so far the most popular and developer-friendly tool out there. Typical way of building the image looks
+   like follows (``my-custom-airflow-image-name`` is the custom name your image has). In case you use some
+   kind of registry where you will be using the image from, it is usually named in the form of
+   ``registry/image-name``. The name of the image has to be configured for the deployment method your
+   image will be deployed. This can be set for example as image name in the
+   `docker-compose file <running-airflow-in-docker>`_ or in the `Helm chart <helm-chart>`_.
+
+.. code-block:: shell
+
+   docker build . -f Dockerfile -t my-custom-airflow-image-name
+
+
+3) Once you build the image locally you have usually several options to make them available for your deployment:

Review comment:
       Missing step: Test that your custom image actually works using the built-in test suite/CI tests/??? before deploying. This guide would just overwrite the existing docker image and you'd be running the latest version of Airflow without ever running any tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MarkusTeufelberger commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
MarkusTeufelberger commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642344319



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,28 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+Installing additional requirements
+..................................
 
-.. _entrypoint:commands:
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, you should create your custom image with dependencies baked in.
 
-Executing commands
-------------------
+Not all dependencies can be installed this way. Dependencies that require compiling cannot be installed
+because they need ``build-essentials`` installed. In case you get compilation problem, you should revert
+to ``customizing image`` - this is the only good way to install dependencies that require compilation. 

Review comment:
       Is there documentation or examples on how to create these custom images and how to keep them up-to-date? If yes, it might be worth linking there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642917919



##########
File path: docs/helm-chart/quick-start.rst
##########
@@ -65,8 +65,17 @@ Run ``kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow``
 to port-forward the Airflow UI to http://localhost:8080/ to confirm
 Airflow is working.
 
-Build a Docker image from your DAGs
------------------------------------
+Extending Airflow Image

Review comment:
       I think this falls into a "quick start"  actually. Originally there was only "adding DAGs to your image" here. but I think it will be as common to add PyPI/Apt dependencies during the "quick start". If you add your DAGs, very likely you want to add dependency. This was the top ask from Helm chart users - in slack in elsewhere ("How can I add new dependency").
   
   I think many users do not realize how easy it is for your local environment to build and deploy your image (and that is needed sooner-or-later most of the time anyway for Airflow). I think this was the main reason why @MarkusTeufelberger and others used the "additional Packages" option in the original helm chart  - they did not realized that they can easily add  PyPI/APT packages via custom image. I think this is the original fallacy of the "dynamic" installation method. You are tempted to do everything by helm chart configuration. 
   
   Unfortunately (or fortunately depends how you look at it) Kubernetes + Helm  + Docker(Container) images  are all a  leaky-abstraction. You should understand (at least to some extent) all of it and be able to modify all of it when you want to deploy application via the helm chart.  So the docs has to be a little about all of it. By adding it where they will be looking for a "quick start" it's also a bit of educating the users that they can and should do it.
   
   Also The "quick start" here is not a "generic" quick start. This is "Quick Start with kind" - very specific case, where you use `kind` as test-and-try platform (you should not  use `kind` in production - it is intended for test and development). Building and loading images in kind is just natural part of the flow with kind. And for the "quick start", you should find all the "quick need" answers on that single page rather than somewhere else (even if you could have a link). So I think adding the most common "quick-start" scenarios makes perfect sense, even if it involves building the image and not configuring helm chart. I thought a bit about this and I am 100% sure this is best place for it.
   
   For production deployment - yes, there is no need to describe it there, simply reference to "Building the image" is enough as for production deployment you have more experience and time to read it.
   
   Also GitLab is a bit different story. GitLab works standalone out-of-the-box for vast majority of cases. There are very little reasons you would like to build GitLab image. On the other hand - I cannot imagine a "serious" deployment of Airflow where you would use the "reference" image. You need to learn to build your own Airflow image - the sooner, the better. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MarkusTeufelberger commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
MarkusTeufelberger commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642846563



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``
+
+* additional steps that should be executed in your image (typically in the form of ``RUN <command>``)
+
+2) Build your image. This can be done with ``docker`` CLI tools and examples below assume ``docker`` is used.
+   There are other tools like ``kaniko`` or ``podman`` that allow you to build the image, but ``docker`` is
+   so far the most popular and developer-friendly tool out there. Typical way of building the image looks
+   like follows (``my-custom-airflow-image-name`` is the custom name your image has). In case you use some
+   kind of registry where you will be using the image from, it is usually named in the form of
+   ``registry/image-name``. The name of the image has to be configured for the deployment method your
+   image will be deployed. This can be set for example as image name in the
+   `docker-compose file <running-airflow-in-docker>`_ or in the `Helm chart <helm-chart>`_.
+
+.. code-block:: shell
+
+   docker build . -f Dockerfile -t my-custom-airflow-image-name
+
+
+3) Once you build the image locally you have usually several options to make them available for your deployment:
+
+* For ``docker-compose`` deployment, that's all you need. The image is stored in docker engine cache
+  and docker compose will use it from there.
+
+* For some - development targeted - Kubernetes deployments you can load the images directly to
+  Kubernetes clusters. Clusters such as ``kind`` or ``minikube`` have dedicated ``load`` method to load the
+  images to the cluster.
+
+* Last but not least - you can push your image to a remote registry which is the most common way
+  of storing and exposing the images, and it is most portable way of publishing the image. Both
+  Docker-Compose and Kubernetes can make use of images exposed via registries.
+
+The most common scenarios where you want to build your own image are adding a new ``apt`` package,
+adding a new ``pip`` dependency and embedding DAGs into the image.

Review comment:
       Similar here, pip is a tool not the name of dependencies in Python.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642714658



##########
File path: docs/apache-airflow/start/docker-compose.yaml
##########
@@ -23,16 +23,25 @@
 # This configuration supports basic configuration using environment variables or an .env file
 # The following variables are supported:
 #
-# AIRFLOW_IMAGE_NAME         - Docker image name used to run Airflow.
-#                              Default: apache/airflow:|version|
-# AIRFLOW_UID                - User ID in Airflow containers
-#                              Default: 50000
-# AIRFLOW_GID                - Group ID in Airflow containers
-#                              Default: 50000
-# _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account.
-#                              Default: airflow
-# _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account.
-#                              Default: airflow
+# AIRFLOW_IMAGE_NAME           - Docker image name used to run Airflow.
+#                                Default: apache/airflow:|version|
+# AIRFLOW_UID                  - User ID in Airflow containers
+#                                Default: 50000
+# AIRFLOW_GID                  - Group ID in Airflow containers
+#                                Default: 50000
+#
+# Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
+#
+# _AIRFLOW_WWW_USER_CREATE     - Whether to create administrator account.
+#                                Default: true
+# _AIRFLOW_WWW_USER_USERNAME   - Username for the administrator account (if requested).
+#                                Default: airflow
+# _AIRFLOW_WWW_USER_PASSWORD   - Password for the administrator account (if requested).
+#                                Default: airflow
+# _AIRFLOW_DB_UPGRADE          - Whether to perform DB upgrade in the init container

Review comment:
       This configuraiton is not customizable by a docker-compose environment variable. It is. a docker image variable. It is hardcoded to true. See: https://github.com/apache/airflow/blob/fb9822222e809aefde68de8aaae4a6d69edd960f/docs/apache-airflow/start/docker-compose.yaml#L152




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642810142



##########
File path: docs/apache-airflow/start/docker-compose.yaml
##########
@@ -23,16 +23,25 @@
 # This configuration supports basic configuration using environment variables or an .env file
 # The following variables are supported:
 #
-# AIRFLOW_IMAGE_NAME         - Docker image name used to run Airflow.
-#                              Default: apache/airflow:|version|
-# AIRFLOW_UID                - User ID in Airflow containers
-#                              Default: 50000
-# AIRFLOW_GID                - Group ID in Airflow containers
-#                              Default: 50000
-# _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account.
-#                              Default: airflow
-# _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account.
-#                              Default: airflow
+# AIRFLOW_IMAGE_NAME           - Docker image name used to run Airflow.
+#                                Default: apache/airflow:|version|
+# AIRFLOW_UID                  - User ID in Airflow containers
+#                                Default: 50000
+# AIRFLOW_GID                  - Group ID in Airflow containers
+#                                Default: 50000
+#
+# Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
+#
+# _AIRFLOW_WWW_USER_CREATE     - Whether to create administrator account.
+#                                Default: true
+# _AIRFLOW_WWW_USER_USERNAME   - Username for the administrator account (if requested).
+#                                Default: airflow
+# _AIRFLOW_WWW_USER_PASSWORD   - Password for the administrator account (if requested).
+#                                Default: airflow
+# _AIRFLOW_DB_UPGRADE          - Whether to perform DB upgrade in the init container

Review comment:
       Yep. Removed it (Also create user) which was the same.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-902150123


   https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#environment-variables-supported-by-docker-compose


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tanujdhiman commented on pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
tanujdhiman commented on pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#issuecomment-902156327


   Thanks @potiuk  for a quick reply. But I saw this page actually but umm, okay.
   
   Can you write down a command which I put in compose file for installing `pydub, dropbox, patool` ??
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MarkusTeufelberger commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
MarkusTeufelberger commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642846193



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``
+
+* additional steps that should be executed in your image (typically in the form of ``RUN <command>``)
+
+2) Build your image. This can be done with ``docker`` CLI tools and examples below assume ``docker`` is used.
+   There are other tools like ``kaniko`` or ``podman`` that allow you to build the image, but ``docker`` is
+   so far the most popular and developer-friendly tool out there. Typical way of building the image looks
+   like follows (``my-custom-airflow-image-name`` is the custom name your image has). In case you use some
+   kind of registry where you will be using the image from, it is usually named in the form of
+   ``registry/image-name``. The name of the image has to be configured for the deployment method your
+   image will be deployed. This can be set for example as image name in the
+   `docker-compose file <running-airflow-in-docker>`_ or in the `Helm chart <helm-chart>`_.
+
+.. code-block:: shell
+
+   docker build . -f Dockerfile -t my-custom-airflow-image-name
+
+
+3) Once you build the image locally you have usually several options to make them available for your deployment:
+
+* For ``docker-compose`` deployment, that's all you need. The image is stored in docker engine cache
+  and docker compose will use it from there.
+
+* For some - development targeted - Kubernetes deployments you can load the images directly to
+  Kubernetes clusters. Clusters such as ``kind`` or ``minikube`` have dedicated ``load`` method to load the
+  images to the cluster.
+
+* Last but not least - you can push your image to a remote registry which is the most common way
+  of storing and exposing the images, and it is most portable way of publishing the image. Both
+  Docker-Compose and Kubernetes can make use of images exposed via registries.
+
+The most common scenarios where you want to build your own image are adding a new ``apt`` package,

Review comment:
       deb, not apt. Apt is the package manager, deb is the format.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642888926



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``
+
+* additional steps that should be executed in your image (typically in the form of ``RUN <command>``)
+
+2) Build your image. This can be done with ``docker`` CLI tools and examples below assume ``docker`` is used.
+   There are other tools like ``kaniko`` or ``podman`` that allow you to build the image, but ``docker`` is
+   so far the most popular and developer-friendly tool out there. Typical way of building the image looks
+   like follows (``my-custom-airflow-image-name`` is the custom name your image has). In case you use some
+   kind of registry where you will be using the image from, it is usually named in the form of
+   ``registry/image-name``. The name of the image has to be configured for the deployment method your
+   image will be deployed. This can be set for example as image name in the
+   `docker-compose file <running-airflow-in-docker>`_ or in the `Helm chart <helm-chart>`_.
+
+.. code-block:: shell
+
+   docker build . -f Dockerfile -t my-custom-airflow-image-name
+
+
+3) Once you build the image locally you have usually several options to make them available for your deployment:
+
+* For ``docker-compose`` deployment, that's all you need. The image is stored in docker engine cache
+  and docker compose will use it from there.
+
+* For some - development targeted - Kubernetes deployments you can load the images directly to
+  Kubernetes clusters. Clusters such as ``kind`` or ``minikube`` have dedicated ``load`` method to load the
+  images to the cluster.
+
+* Last but not least - you can push your image to a remote registry which is the most common way
+  of storing and exposing the images, and it is most portable way of publishing the image. Both
+  Docker-Compose and Kubernetes can make use of images exposed via registries.
+
+The most common scenarios where you want to build your own image are adding a new ``apt`` package,
+adding a new ``pip`` dependency and embedding DAGs into the image.

Review comment:
       Here yes. it should by `PyPI` which we use all over our code. Correcting.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642891577



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``
+
+* additional steps that should be executed in your image (typically in the form of ``RUN <command>``)
+
+2) Build your image. This can be done with ``docker`` CLI tools and examples below assume ``docker`` is used.
+   There are other tools like ``kaniko`` or ``podman`` that allow you to build the image, but ``docker`` is
+   so far the most popular and developer-friendly tool out there. Typical way of building the image looks
+   like follows (``my-custom-airflow-image-name`` is the custom name your image has). In case you use some
+   kind of registry where you will be using the image from, it is usually named in the form of
+   ``registry/image-name``. The name of the image has to be configured for the deployment method your
+   image will be deployed. This can be set for example as image name in the
+   `docker-compose file <running-airflow-in-docker>`_ or in the `Helm chart <helm-chart>`_.
+
+.. code-block:: shell
+
+   docker build . -f Dockerfile -t my-custom-airflow-image-name
+
+
+3) Once you build the image locally you have usually several options to make them available for your deployment:

Review comment:
       Very good point. You need Airlfow sources for that (this is not a standalone tool but it requires airflow to be checked out). But it's worth mentioning as optional step.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642888171



##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``
+
+* additional steps that should be executed in your image (typically in the form of ``RUN <command>``)
+
+2) Build your image. This can be done with ``docker`` CLI tools and examples below assume ``docker`` is used.
+   There are other tools like ``kaniko`` or ``podman`` that allow you to build the image, but ``docker`` is
+   so far the most popular and developer-friendly tool out there. Typical way of building the image looks
+   like follows (``my-custom-airflow-image-name`` is the custom name your image has). In case you use some
+   kind of registry where you will be using the image from, it is usually named in the form of
+   ``registry/image-name``. The name of the image has to be configured for the deployment method your
+   image will be deployed. This can be set for example as image name in the
+   `docker-compose file <running-airflow-in-docker>`_ or in the `Helm chart <helm-chart>`_.
+
+.. code-block:: shell
+
+   docker build . -f Dockerfile -t my-custom-airflow-image-name
+
+
+3) Once you build the image locally you have usually several options to make them available for your deployment:
+
+* For ``docker-compose`` deployment, that's all you need. The image is stored in docker engine cache
+  and docker compose will use it from there.
+
+* For some - development targeted - Kubernetes deployments you can load the images directly to
+  Kubernetes clusters. Clusters such as ``kind`` or ``minikube`` have dedicated ``load`` method to load the
+  images to the cluster.
+
+* Last but not least - you can push your image to a remote registry which is the most common way
+  of storing and exposing the images, and it is most portable way of publishing the image. Both
+  Docker-Compose and Kubernetes can make use of images exposed via registries.
+
+The most common scenarios where you want to build your own image are adding a new ``apt`` package,

Review comment:
       While technically you are right, similarly as in case of Docker images (which are technically Container images), most peoplpe have no idea they are using `deb` packages, but they know how to use `apt`. We are using the term 'apt dependencies' all-over our documentation and for the sake of consistency and better discoverability I prefer to leave it as 'apt dependencies' (even if this is technically not precise).  We are using debian buster as our base image so the `apt` term is more familiar to users of the image.
   
   Similarly we use `PyPI` packages where technically those are `.whl` or '.sdist` packages. We also some time ago used `pip` packages but since some of our users use `poetry` or `pip tools` (even if it is not recommended) and 'PyPI` is familiar to most Python users we use the term `PyPI packages`. 

##########
File path: docs/docker-stack/build.rst
##########
@@ -15,16 +15,126 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _build:build_image:
+
 Building the image
 ==================
 
-Before you dive-deeply in the way how the Airflow Image is build, named and why we are doing it the
-way we do, you might want to know very quickly how you can extend or customize the existing image
-for Apache Airflow. This chapter gives you a short answer to those questions.
+Before you dive-deeply in the way how the Airflow Image is build, let us first explain why you might need
+to build the custom container image and we show a few typical ways you can do it.
+
+Why custom image ?
+------------------
+
+The Apache Airflow community, releases Docker Images which are ``reference images`` for Apache Airflow.
+However, Airflow has more than 60 community managed providers (installable via extras) and some of the
+default extras/providers installed are not used by everyone, sometimes others extras/providers
+are needed, sometimes (very often actually) you need to add your own custom dependencies,
+packages or even custom providers.
+
+In Kubernetes and Docker terms this means that you need another image with your specific requirements.
+This is why you should learn how to build your own Docker (or more properly Container) image.
+You might be tempted to use the ``reference image`` and dynamically install the new packages while
+starting your containers, but this is a bad idea for multiple reasons - starting from fragility of the build
+and ending with the extra time needed to install those packages - which has to happen every time every
+container starts. The only viable way to deal with new dependencies and requirements in production is to
+build and use your own image. You should only use installing dependencies dynamically in case of
+"hobbyist" and "quick start" scenarios when you want to iterate quickly to try things out and later
+replace it with your own images.
+
+How to build your own image
+---------------------------
+
+There are several most-typical scenarios that you will encounter and here is a quick recipe on how to achieve
+your goal quickly. In order to understand details you can read further, but for the simple cases using
+typical tools here are the simple examples.
+
+In the simplest case building your image consists of those steps:
+
+1) Create your own ``Dockerfile`` (name it ``Dockerfile``) where you add:
+
+* information what your image should be based on (for example ``FROM: apache/airflow:latest-python3.8``
+
+* additional steps that should be executed in your image (typically in the form of ``RUN <command>``)
+
+2) Build your image. This can be done with ``docker`` CLI tools and examples below assume ``docker`` is used.
+   There are other tools like ``kaniko`` or ``podman`` that allow you to build the image, but ``docker`` is
+   so far the most popular and developer-friendly tool out there. Typical way of building the image looks
+   like follows (``my-custom-airflow-image-name`` is the custom name your image has). In case you use some
+   kind of registry where you will be using the image from, it is usually named in the form of
+   ``registry/image-name``. The name of the image has to be configured for the deployment method your
+   image will be deployed. This can be set for example as image name in the
+   `docker-compose file <running-airflow-in-docker>`_ or in the `Helm chart <helm-chart>`_.
+
+.. code-block:: shell
+
+   docker build . -f Dockerfile -t my-custom-airflow-image-name
+
+
+3) Once you build the image locally you have usually several options to make them available for your deployment:
+
+* For ``docker-compose`` deployment, that's all you need. The image is stored in docker engine cache
+  and docker compose will use it from there.
+
+* For some - development targeted - Kubernetes deployments you can load the images directly to
+  Kubernetes clusters. Clusters such as ``kind`` or ``minikube`` have dedicated ``load`` method to load the
+  images to the cluster.
+
+* Last but not least - you can push your image to a remote registry which is the most common way
+  of storing and exposing the images, and it is most portable way of publishing the image. Both
+  Docker-Compose and Kubernetes can make use of images exposed via registries.
+
+The most common scenarios where you want to build your own image are adding a new ``apt`` package,

Review comment:
       While technically you are right, similarly as in case of Docker images (which are technically Container images), most people have no idea they are using `deb` packages, but they know how to use `apt`. We are using the term 'apt dependencies' all-over our documentation and for the sake of consistency and better discoverability I prefer to leave it as 'apt dependencies' (even if this is technically not precise).  We are using debian buster as our base image so the `apt` term is more familiar to users of the image.
   
   Similarly we use `PyPI` packages where technically those are `.whl` or '.sdist` packages. We also some time ago used `pip` packages but since some of our users use `poetry` or `pip tools` (even if it is not recommended) and 'PyPI` is familiar to most Python users we use the term `PyPI packages`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642917919



##########
File path: docs/helm-chart/quick-start.rst
##########
@@ -65,8 +65,17 @@ Run ``kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow``
 to port-forward the Airflow UI to http://localhost:8080/ to confirm
 Airflow is working.
 
-Build a Docker image from your DAGs
------------------------------------
+Extending Airflow Image

Review comment:
       I think this falls into a "quick start"  actually. Originally there was only "adding DAGs to your image" here. but I think it will be as common to add PyPI/Apt dependencies during the "quick start". If you add your DAGs, very likely you want to add dependency. This was the top ask from Helm chart users - in slack in elsewhere ("How can I add new dependency").
   
   I think many users do not realize how easy it is for your local environment to build and deploy your image (and that is needed sooner-or-later most of the time anyway for Airflow). I think this was the main reason why @MarkusTeufelberger and others used the "additional Packages" option in the original helm chart  - they did not realize *or did not want to make the extra hoop) that they can easily (and should) add  PyPI/APT packages via custom image. I think this is the original fallacy of the "dynamic" installation method. You are tempted to do everything by helm chart configuration. 
   
   Unfortunately (or fortunately depends how you look at it) Kubernetes + Helm  + Docker(Container) images  are all a  leaky-abstraction. You should understand (at least to some extent) all of it and be able to modify all of it when you want to deploy application via the helm chart.  So the docs has to be a little about all of it. By adding it where they will be looking for a "quick start" it's also a bit of educating the users that they can and should do it.
   
   Also The "quick start" here is not a "generic" quick start. This is "Quick Start with kind" - very specific case, where you use `kind` as test-and-try platform (you should not  use `kind` in production - it is intended for test and development). Building and loading images in kind is just natural part of the flow with kind. And for the "quick start", you should find all the "quick need" answers on that single page rather than somewhere else (even if you could have a link). So I think adding the most common "quick-start" scenarios makes perfect sense, even if it involves building the image and not configuring helm chart. I thought a bit about this and I am 100% sure this is best place for it.
   
   For production deployment - yes, there is no need to describe it there, simply reference to "Building the image" is enough as for production deployment you have more experience and time to read it.
   
   Also GitLab is a bit different story. GitLab works standalone out-of-the-box for vast majority of cases. There are very little reasons you would like to build GitLab image. On the other hand - I cannot imagine a "serious" deployment of Airflow where you would use the "reference" image. You need to learn to build your own Airflow image - the sooner, the better. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642123633



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -311,6 +311,10 @@ if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
     create_www_user
 fi
 
+if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
+    pip install --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"

Review comment:
       ```suggestion
       pip install --no-cache --user "${_PIP_ADDITIONAL_REQUIREMENTS=}"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MarkusTeufelberger commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
MarkusTeufelberger commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642612873



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -185,66 +259,28 @@ database and creating an ``admin/admin`` Admin user with the following command:
 The commands above perform initialization of the SQLite database, create admin user with admin password
 and Admin role. They also forward local port ``8080`` to the webserver port and finally start the webserver.
 
-Waits for celery broker connection
-----------------------------------
-
-In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
-commands are used the entrypoint will wait until the celery broker DB connection is available.
-
-The script detects backend type depending on the URL schema and assigns default port numbers if not specified
-in the URL. Then it loops until connection to the host/port specified can be established
-It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
-To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
-
-Supported schemes:
-
-* ``amqp(s)://``  (rabbitmq) - default port 5672
-* ``redis://``               - default port 6379
-* ``postgres://``            - default port 5432
-* ``mysql://``               - default port 3306
-
-Waiting for connection involves checking if a matching port is open.
-The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
-:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
-as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
-takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+Installing additional requirements
+..................................
 
-.. _entrypoint:commands:
+Installing additional requirements can be done by specifying ``_PIP_ADDITIONAL_REQUIREMENTS`` variable.
+The variable should contain a list of requirements that should be installed additionally when entering
+the containers. Note that this option slows down starting of Airflow as every time any container starts
+it must install new packages. Therefore this option should only be used for testing. When testing is
+finished, you should create your custom image with dependencies baked in.
 
-Executing commands
-------------------
+Not all dependencies can be installed this way. Dependencies that require compiling cannot be installed
+because they need ``build-essentials`` installed. In case you get compilation problem, you should revert
+to ``customizing image`` - this is the only good way to install dependencies that require compilation. 

Review comment:
       Maybe you misunderstood the talos case? They explicitly tell you to run the `registry:2` container which (unsurprisingly) _is_ a docker registry. This is NOT the local docker cache, it is a fully fledged implementation of a docker registry. All you do then is to push your images to this local registry and tell your cluster to pull from it in case the one in the big bad internet is not available (as would be the case for air-gaped systems). From the PoV of the cluster it is accessing an external registry, from the PoV of your machine it runs a container with a docker registry in it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #16170: Adding extra requirements for build and runtime of the PROD image.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #16170:
URL: https://github.com/apache/airflow/pull/16170#discussion_r642123407



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -120,8 +120,82 @@ takes precedence over the :envvar:`AIRFLOW__CORE__SQL_ALCHEMY_CONN` variable.
 For newer versions, the ``airflow db check`` command is used, which means that a ``select 1 as is_alive;`` query
 is executed. This also means that you can keep your password in secret backend.
 
+Waits for celery broker connection
+----------------------------------
+
+In case Postgres or MySQL DB is used, and one of the ``scheduler``, ``celery``, ``worker``, or ``flower``
+commands are used the entrypoint will wait until the celery broker DB connection is available.
+
+The script detects backend type depending on the URL schema and assigns default port numbers if not specified
+in the URL. Then it loops until connection to the host/port specified can be established
+It tries :envvar:`CONNECTION_CHECK_MAX_COUNT` times and sleeps :envvar:`CONNECTION_CHECK_SLEEP_TIME` between checks.
+To disable check, set ``CONNECTION_CHECK_MAX_COUNT=0``.
+
+Supported schemes:
+
+* ``amqp(s)://``  (rabbitmq) - default port 5672
+* ``redis://``               - default port 6379
+* ``postgres://``            - default port 5432
+* ``mysql://``               - default port 3306
+
+Waiting for connection involves checking if a matching port is open.
+The host information is derived from the variables :envvar:`AIRFLOW__CELERY__BROKER_URL` and
+:envvar:`AIRFLOW__CELERY__BROKER_URL_CMD`. If :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
+is passed to the container, it is evaluated as a command to execute and result of this evaluation is used
+as :envvar:`AIRFLOW__CELERY__BROKER_URL`. The :envvar:`AIRFLOW__CELERY__BROKER_URL_CMD` variable
+takes precedence over the :envvar:`AIRFLOW__CELERY__BROKER_URL` variable.
+
+.. _entrypoint:commands:
+
+Executing commands
+------------------
+
+If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command
+if you specify extra arguments. For example:
+
+.. code-block:: bash
+
+  docker run -it apache/airflow:2.1.0-python3.6 bash -c "ls -la"
+  total 16
+  drwxr-xr-x 4 airflow root 4096 Jun  5 18:12 .
+  drwxr-xr-x 1 root    root 4096 Jun  5 18:12 ..
+  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 dags
+  drwxr-xr-x 2 airflow root 4096 Jun  5 18:12 logs
+
+If first argument is equal to ``python`` - you are dropped in python shell or python commands are executed if
+you pass extra parameters. For example:
+
+.. code-block:: bash
+
+  > docker run -it apache/airflow:2.1.0-python3.6 python -c "print('test')"
+  test
+
+If first argument equals to "airflow" - the rest of the arguments is treated as an airflow command
+to execute. Example:
+
+.. code-block:: bash
+
+   docker run -it apache/airflow:2.1.0-python3.6 airflow webserver
+
+If there are any other arguments - they are simply passed to the "airflow" command
+
+.. code-block:: bash
+
+  > docker run -it apache/airflow:2.1.0-python3.6 version
+  2.1.0
+
+Additional quick test options
+-----------------------------
+
+The options below are mostly used for quick testing the image - for example with
+quick-start docker-compose or when you want to perform a local test with new packages
+added. They are not supposed to be run in the production environment as they add additional
+overhead for execution of additional commands. Those options in production should be executed

Review comment:
       ```suggestion
   overhead for execution of additional commands. Those options in production should be realized
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org