You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "o-nikolas (via GitHub)" <gi...@apache.org> on 2023/02/10 18:48:35 UTC

[GitHub] [airflow] o-nikolas opened a new pull request, #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile

o-nikolas opened a new pull request, #29469:
URL: https://github.com/apache/airflow/pull/29469

   While going through the provider release verification steps I found the documentation lacking and a bit vague in places. I made some updates to the README.
   
   Also made some updates around the Dockerfile.pmc to make it a bit more user friendly.
   
   Rendered screenshot of the modified sections:
    
   ![Screenshot from 2023-02-10 10-44-33](https://user-images.githubusercontent.com/65743084/218173220-0d693dde-b258-4439-8dc1-39472e0716a6.png)
   
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk merged PR #29469:
URL: https://github.com/apache/airflow/pull/29469


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on a diff in pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile

Posted by "o-nikolas (via GitHub)" <gi...@apache.org>.
o-nikolas commented on code in PR #29469:
URL: https://github.com/apache/airflow/pull/29469#discussion_r1103359205


##########
dev/README_RELEASE_PROVIDER_PACKAGES.md:
##########
@@ -561,14 +561,41 @@ Or update it if you already checked it out:
 svn update .
 ```
 
-Optionally you can use `check_files.py` script to verify that all expected files are
-present in SVN. This script may help also with verifying installation of the packages.
+Optionally you can use the [`check_files.py`](https://github.com/apache/airflow/blob/main/dev/check_files.py)
+script to verify that all expected files are present in SVN. This script will produce a `Dockerfile.pmc` which
+may help with verifying installation of the packages.
 
 ```shell script
 # Copy the list of packages (pypi urls) into `packages.txt` then run:
 python check_files.py providers -p {PATH_TO_SVN}
 ```
 
+After the above script completes you can build `Dockerfile.pmc` to trigger an installation of each provider
+package:
+
+```shell script
+docker build - < Dockerfile.pmc
+```
+
+**Note**: This may fail to install some providers. For example, if they require some system level dependencies
+that aren't present in the image. If you wish to investigate you may update the Dockerfile to install any
+missing dependencies and then try to build again.
+
+For example, currently `apache-airflow-providers-apache-hive` requires the `libsasl2` system dependency. To

Review Comment:
   I tested with this base image and it does indeed work nicely. I don't love that it's pinned to a specific python version (which will eventually need to be updated) but I think it's worth it to get this process to be more seamless.
   
   I'll update the PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #29469:
URL: https://github.com/apache/airflow/pull/29469#discussion_r1106545002


##########
dev/README_RELEASE_PROVIDER_PACKAGES.md:
##########
@@ -561,14 +561,41 @@ Or update it if you already checked it out:
 svn update .
 ```
 
-Optionally you can use `check_files.py` script to verify that all expected files are
-present in SVN. This script may help also with verifying installation of the packages.
+Optionally you can use the [`check_files.py`](https://github.com/apache/airflow/blob/main/dev/check_files.py)
+script to verify that all expected files are present in SVN. This script will produce a `Dockerfile.pmc` which
+may help with verifying installation of the packages.
 
 ```shell script
 # Copy the list of packages (pypi urls) into `packages.txt` then run:
 python check_files.py providers -p {PATH_TO_SVN}
 ```
 
+After the above script completes you can build `Dockerfile.pmc` to trigger an installation of each provider
+package:
+
+```shell script
+docker build - < Dockerfile.pmc
+```
+
+**Note**: This may fail to install some providers. For example, if they require some system level dependencies
+that aren't present in the image. If you wish to investigate you may update the Dockerfile to install any
+missing dependencies and then try to build again.
+
+For example, currently `apache-airflow-providers-apache-hive` requires the `libsasl2` system dependency. To

Review Comment:
   Yes. Good point (sorry I missed it). 
   
   This is because you get the providers also in the "airlfow/providers" source code (this is how airflow is installed in the CI image - not from package, but from sources copied to inside the code, so that the source code can be replaced by just mounting the "airflow" volume in the place where airflow is installed. 
   
   The solution is simple. Remove `/opt/airflow/airflow/providers" right after "FROM". 
   
   We use similar approach in our CI when we test installation of providers, but in this case we are removing the whole "airflow" folder by ... (wait for it) ... mounting an empty directory in it's place :) (see below).
   
   In your case it's easier to remove it in Dockerfile as we do not care about the size of the image (the layer with all airflow sources will stay in the image and you build the image anyway. In case of CI, we do not want to build extra image and we can simply set env var that turns into option and mounts the empty dir (which we do automatically when ``--use-airflow-version`` is used https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/params/shell_params.py#L220 : 
   
   Try it yourself with breeze (and you can use `--dry-run` to see the actual command or look at the breeze source code):
   
   ```
   >  breeze shell --mount-sources skip
   ...
   root@a51c9ce0ae89:/opt/airflow# ls -la airflow/
   total 276
   drwxr-xr-x 60 root root  1920 Feb 13 22:40 .
   drwxr-xr-x  1 root root  4096 Feb 15 00:50 ..
   -rw-r--r--  1 root root  4670 Feb 12 18:21 __init__.py
   -rw-r--r--  1 root root  1403 Jan 13 11:23 __main__.py
   drwxr-xr-x  5 root root   160 Dec 18 14:57 _vendor
   -rw-r--r--  1 root root  2320 Sep 10 14:26 alembic.ini
   drwxr-xr-x  6 root root   192 Feb 12 22:21 api
   drwxr-xr-x 10 root root   320 Feb 12 22:21 api_connexion
   drwxr-xr-x  6 root root   192 Feb 13 22:40 api_internal
   drwxr-xr-x  7 root root   224 Feb 13 22:40 callbacks
   
   ....
   
   drwxr-xr-x 65 root root  2080 Dec 29 20:50 providers
   -rw-r--r--  1 root root 44445 Jan 13 11:23 providers_manager.py
   -rw-r--r--  1 root root   846 Sep 10 14:26 py.typed
   drwxr-xr-x  7 root root   224 Feb 13 22:40 secrets
   ```
   
   vs. 
   
   ```
   > breeze shell --mount-sources remove
   ...
   
   root@7ff23b566750:/opt/airflow# ls -la airflow/
   total 4
   drwxr-xr-x 3 root root   96 May 16  2022 .
   drwxr-xr-x 1 root root 4096 Feb 15 00:51 ..
   -rw-r--r-- 1 root root    0 May 16  2022 .gitignore
   root@7ff23b566750:/opt/airflow#
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on a diff in pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile

Posted by "o-nikolas (via GitHub)" <gi...@apache.org>.
o-nikolas commented on code in PR #29469:
URL: https://github.com/apache/airflow/pull/29469#discussion_r1103374145


##########
dev/README_RELEASE_PROVIDER_PACKAGES.md:
##########
@@ -561,14 +561,41 @@ Or update it if you already checked it out:
 svn update .
 ```
 
-Optionally you can use `check_files.py` script to verify that all expected files are
-present in SVN. This script may help also with verifying installation of the packages.
+Optionally you can use the [`check_files.py`](https://github.com/apache/airflow/blob/main/dev/check_files.py)
+script to verify that all expected files are present in SVN. This script will produce a `Dockerfile.pmc` which
+may help with verifying installation of the packages.
 
 ```shell script
 # Copy the list of packages (pypi urls) into `packages.txt` then run:
 python check_files.py providers -p {PATH_TO_SVN}
 ```
 
+After the above script completes you can build `Dockerfile.pmc` to trigger an installation of each provider
+package:
+
+```shell script
+docker build - < Dockerfile.pmc
+```
+
+**Note**: This may fail to install some providers. For example, if they require some system level dependencies
+that aren't present in the image. If you wish to investigate you may update the Dockerfile to install any
+missing dependencies and then try to build again.
+
+For example, currently `apache-airflow-providers-apache-hive` requires the `libsasl2` system dependency. To

Review Comment:
   I actually noticed something weird. The install of the RC providers works when using the ci image as the dockerfile base image. But when I run `airflow info` I don't see the RC versions being used:
   #### The build:
   ```
   $ docker build -f Dockerfile.pmc --tag pmc_ci_image .
   
   [+] Building 153.3s (21/21) FINISHED                                                                                  
    => [internal] load .dockerignore                                                                                0.0s
    => => transferring context: 2B                                                                                  0.0s
    => [internal] load build definition from Dockerfile.pmc                                                         0.0s
    => => transferring dockerfile: 1.35kB                                                                           0.0s
    => [internal] load metadata for ghcr.io/apache/airflow/main/ci/python3.10:latest                                0.0s
    => [ 1/17] FROM ghcr.io/apache/airflow/main/ci/python3.10                                                       0.1s
    => [ 2/17] RUN pip install 'apache-airflow-providers-amazon==7.2.0rc1'                                          9.4s
    => [ 3/17] RUN pip install 'apache-airflow-providers-apache-beam==4.2.0rc1'                                     8.5s
    => [ 4/17] RUN pip install 'apache-airflow-providers-apache-hive==5.1.2rc1'                                     8.8s 
    => [ 5/17] RUN pip install 'apache-airflow-providers-arangodb==2.1.1rc1'                                        8.6s 
    => [ 6/17] RUN pip install 'apache-airflow-providers-cncf-kubernetes==5.2.0rc1'                                11.4s 
    => [ 7/17] RUN pip install 'apache-airflow-providers-dbt-cloud==3.0.0rc1'                                       8.7s 
    => [ 8/17] RUN pip install 'apache-airflow-providers-elasticsearch==4.4.0rc1'                                   9.1s 
    => [ 9/17] RUN pip install 'apache-airflow-providers-ftp==3.3.1rc1'                                             7.3s 
    => [10/17] RUN pip install 'apache-airflow-providers-google==8.9.0rc1'                                         16.2s 
    => [11/17] RUN pip install 'apache-airflow-providers-microsoft-azure==5.2.0rc1'                                 9.9s 
    => [12/17] RUN pip install 'apache-airflow-providers-mysql==4.0.1rc1'                                           9.4s 
    => [13/17] RUN pip install 'apache-airflow-providers-presto==4.2.2rc1'                                          9.1s 
    => [14/17] RUN pip install 'apache-airflow-providers-sftp==4.2.2rc1'                                            9.4s 
    => [15/17] RUN pip install 'apache-airflow-providers-snowflake==4.0.3rc1'                                       9.0s 
    => [16/17] RUN pip install 'apache-airflow-providers-tableau==4.1.0rc1'                                         8.8s 
    => [17/17] RUN pip install 'apache-airflow-providers-trino==4.3.2rc1'                                           8.8s 
    => exporting to image                                                                                           0.7s 
    => => exporting layers                                                                                          0.6s 
    => => writing image sha256:d549269c2ffa35f57fb54b8eef371bcbd5d60e98b5ca39b5eaed171c81674bd1                     0.0s
    => => naming to docker.io/library/pmc_ci_image
    ```
    #### Checking the version (I see the incorrect version)
    ```
    $ docker run --rm --entrypoint airflow pmc_ci_image info | grep amazon
   
     apache-airflow-providers-amazon           | 7.1.0
   ```
   #### Checking the versions with the image based on `apache:airflow/latest` (I see the correct RC version)
   ```
   $ docker run --rm pmc airflow info | grep amazon
   
    apache-airflow-providers-amazon          | 7.2.0rc1
   ```
   Any idea what's going on @potiuk?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile

Posted by "o-nikolas (via GitHub)" <gi...@apache.org>.
o-nikolas commented on PR #29469:
URL: https://github.com/apache/airflow/pull/29469#issuecomment-1452742994

   @potiuk I think this one is now ready for another round of review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #29469:
URL: https://github.com/apache/airflow/pull/29469#discussion_r1103228230


##########
dev/README_RELEASE_PROVIDER_PACKAGES.md:
##########
@@ -561,14 +561,41 @@ Or update it if you already checked it out:
 svn update .
 ```
 
-Optionally you can use `check_files.py` script to verify that all expected files are
-present in SVN. This script may help also with verifying installation of the packages.
+Optionally you can use the [`check_files.py`](https://github.com/apache/airflow/blob/main/dev/check_files.py)
+script to verify that all expected files are present in SVN. This script will produce a `Dockerfile.pmc` which
+may help with verifying installation of the packages.
 
 ```shell script
 # Copy the list of packages (pypi urls) into `packages.txt` then run:
 python check_files.py providers -p {PATH_TO_SVN}
 ```
 
+After the above script completes you can build `Dockerfile.pmc` to trigger an installation of each provider
+package:
+
+```shell script
+docker build - < Dockerfile.pmc
+```
+
+**Note**: This may fail to install some providers. For example, if they require some system level dependencies
+that aren't present in the image. If you wish to investigate you may update the Dockerfile to install any
+missing dependencies and then try to build again.
+
+For example, currently `apache-airflow-providers-apache-hive` requires the `libsasl2` system dependency. To

Review Comment:
   If we replace `FROM apache/airflow:latest` with `FROM  ghcr.io/apache/airflow/main/ci/python3.10` or smth - that might be not needed any more. The CI image has all that is needed.
   
   Also I've been thinking about releasing "fat" airflow image - as opposed to "slim" (go figure). Several users would have less problems if we had one. This might be the right long-term solution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org