You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "o-nikolas (via GitHub)" <gi...@apache.org> on 2023/02/10 18:48:35 UTC
[GitHub] [airflow] o-nikolas opened a new pull request, #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile
o-nikolas opened a new pull request, #29469:
URL: https://github.com/apache/airflow/pull/29469
While going through the provider release verification steps I found the documentation lacking and a bit vague in places. I made some updates to the README.
Also made some updates around the Dockerfile.pmc to make it a bit more user friendly.
Rendered screenshot of the modified sections:
![Screenshot from 2023-02-10 10-44-33](https://user-images.githubusercontent.com/65743084/218173220-0d693dde-b258-4439-8dc1-39472e0716a6.png)
<!--
Thank you for contributing! Please make sure that your code changes
are covered with tests. And in case of new features or big changes
remember to adjust the documentation.
Feel free to ping committers for the review!
In case of an existing issue, reference it using one of the following:
closes: #ISSUE
related: #ISSUE
How to write a good git commit message:
http://chris.beams.io/posts/git-commit/
-->
---
**^ Add meaningful description above**
Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk merged pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk merged PR #29469:
URL: https://github.com/apache/airflow/pull/29469
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] o-nikolas commented on a diff in pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile
Posted by "o-nikolas (via GitHub)" <gi...@apache.org>.
o-nikolas commented on code in PR #29469:
URL: https://github.com/apache/airflow/pull/29469#discussion_r1103359205
##########
dev/README_RELEASE_PROVIDER_PACKAGES.md:
##########
@@ -561,14 +561,41 @@ Or update it if you already checked it out:
svn update .
```
-Optionally you can use `check_files.py` script to verify that all expected files are
-present in SVN. This script may help also with verifying installation of the packages.
+Optionally you can use the [`check_files.py`](https://github.com/apache/airflow/blob/main/dev/check_files.py)
+script to verify that all expected files are present in SVN. This script will produce a `Dockerfile.pmc` which
+may help with verifying installation of the packages.
```shell script
# Copy the list of packages (pypi urls) into `packages.txt` then run:
python check_files.py providers -p {PATH_TO_SVN}
```
+After the above script completes you can build `Dockerfile.pmc` to trigger an installation of each provider
+package:
+
+```shell script
+docker build - < Dockerfile.pmc
+```
+
+**Note**: This may fail to install some providers. For example, if they require some system level dependencies
+that aren't present in the image. If you wish to investigate you may update the Dockerfile to install any
+missing dependencies and then try to build again.
+
+For example, currently `apache-airflow-providers-apache-hive` requires the `libsasl2` system dependency. To
Review Comment:
I tested with this base image and it does indeed work nicely. I don't love that it's pinned to a specific python version (which will eventually need to be updated) but I think it's worth it to get this process to be more seamless.
I'll update the PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #29469:
URL: https://github.com/apache/airflow/pull/29469#discussion_r1106545002
##########
dev/README_RELEASE_PROVIDER_PACKAGES.md:
##########
@@ -561,14 +561,41 @@ Or update it if you already checked it out:
svn update .
```
-Optionally you can use `check_files.py` script to verify that all expected files are
-present in SVN. This script may help also with verifying installation of the packages.
+Optionally you can use the [`check_files.py`](https://github.com/apache/airflow/blob/main/dev/check_files.py)
+script to verify that all expected files are present in SVN. This script will produce a `Dockerfile.pmc` which
+may help with verifying installation of the packages.
```shell script
# Copy the list of packages (pypi urls) into `packages.txt` then run:
python check_files.py providers -p {PATH_TO_SVN}
```
+After the above script completes you can build `Dockerfile.pmc` to trigger an installation of each provider
+package:
+
+```shell script
+docker build - < Dockerfile.pmc
+```
+
+**Note**: This may fail to install some providers. For example, if they require some system level dependencies
+that aren't present in the image. If you wish to investigate you may update the Dockerfile to install any
+missing dependencies and then try to build again.
+
+For example, currently `apache-airflow-providers-apache-hive` requires the `libsasl2` system dependency. To
Review Comment:
Yes. Good point (sorry I missed it).
This is because you get the providers also in the "airlfow/providers" source code (this is how airflow is installed in the CI image - not from package, but from sources copied to inside the code, so that the source code can be replaced by just mounting the "airflow" volume in the place where airflow is installed.
The solution is simple. Remove `/opt/airflow/airflow/providers" right after "FROM".
We use similar approach in our CI when we test installation of providers, but in this case we are removing the whole "airflow" folder by ... (wait for it) ... mounting an empty directory in it's place :) (see below).
In your case it's easier to remove it in Dockerfile as we do not care about the size of the image (the layer with all airflow sources will stay in the image and you build the image anyway. In case of CI, we do not want to build extra image and we can simply set env var that turns into option and mounts the empty dir (which we do automatically when ``--use-airflow-version`` is used https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/params/shell_params.py#L220 :
Try it yourself with breeze (and you can use `--dry-run` to see the actual command or look at the breeze source code):
```
> breeze shell --mount-sources skip
...
root@a51c9ce0ae89:/opt/airflow# ls -la airflow/
total 276
drwxr-xr-x 60 root root 1920 Feb 13 22:40 .
drwxr-xr-x 1 root root 4096 Feb 15 00:50 ..
-rw-r--r-- 1 root root 4670 Feb 12 18:21 __init__.py
-rw-r--r-- 1 root root 1403 Jan 13 11:23 __main__.py
drwxr-xr-x 5 root root 160 Dec 18 14:57 _vendor
-rw-r--r-- 1 root root 2320 Sep 10 14:26 alembic.ini
drwxr-xr-x 6 root root 192 Feb 12 22:21 api
drwxr-xr-x 10 root root 320 Feb 12 22:21 api_connexion
drwxr-xr-x 6 root root 192 Feb 13 22:40 api_internal
drwxr-xr-x 7 root root 224 Feb 13 22:40 callbacks
....
drwxr-xr-x 65 root root 2080 Dec 29 20:50 providers
-rw-r--r-- 1 root root 44445 Jan 13 11:23 providers_manager.py
-rw-r--r-- 1 root root 846 Sep 10 14:26 py.typed
drwxr-xr-x 7 root root 224 Feb 13 22:40 secrets
```
vs.
```
> breeze shell --mount-sources remove
...
root@7ff23b566750:/opt/airflow# ls -la airflow/
total 4
drwxr-xr-x 3 root root 96 May 16 2022 .
drwxr-xr-x 1 root root 4096 Feb 15 00:51 ..
-rw-r--r-- 1 root root 0 May 16 2022 .gitignore
root@7ff23b566750:/opt/airflow#
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] o-nikolas commented on a diff in pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile
Posted by "o-nikolas (via GitHub)" <gi...@apache.org>.
o-nikolas commented on code in PR #29469:
URL: https://github.com/apache/airflow/pull/29469#discussion_r1103374145
##########
dev/README_RELEASE_PROVIDER_PACKAGES.md:
##########
@@ -561,14 +561,41 @@ Or update it if you already checked it out:
svn update .
```
-Optionally you can use `check_files.py` script to verify that all expected files are
-present in SVN. This script may help also with verifying installation of the packages.
+Optionally you can use the [`check_files.py`](https://github.com/apache/airflow/blob/main/dev/check_files.py)
+script to verify that all expected files are present in SVN. This script will produce a `Dockerfile.pmc` which
+may help with verifying installation of the packages.
```shell script
# Copy the list of packages (pypi urls) into `packages.txt` then run:
python check_files.py providers -p {PATH_TO_SVN}
```
+After the above script completes you can build `Dockerfile.pmc` to trigger an installation of each provider
+package:
+
+```shell script
+docker build - < Dockerfile.pmc
+```
+
+**Note**: This may fail to install some providers. For example, if they require some system level dependencies
+that aren't present in the image. If you wish to investigate you may update the Dockerfile to install any
+missing dependencies and then try to build again.
+
+For example, currently `apache-airflow-providers-apache-hive` requires the `libsasl2` system dependency. To
Review Comment:
I actually noticed something weird. The install of the RC providers works when using the ci image as the dockerfile base image. But when I run `airflow info` I don't see the RC versions being used:
#### The build:
```
$ docker build -f Dockerfile.pmc --tag pmc_ci_image .
[+] Building 153.3s (21/21) FINISHED
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile.pmc 0.0s
=> => transferring dockerfile: 1.35kB 0.0s
=> [internal] load metadata for ghcr.io/apache/airflow/main/ci/python3.10:latest 0.0s
=> [ 1/17] FROM ghcr.io/apache/airflow/main/ci/python3.10 0.1s
=> [ 2/17] RUN pip install 'apache-airflow-providers-amazon==7.2.0rc1' 9.4s
=> [ 3/17] RUN pip install 'apache-airflow-providers-apache-beam==4.2.0rc1' 8.5s
=> [ 4/17] RUN pip install 'apache-airflow-providers-apache-hive==5.1.2rc1' 8.8s
=> [ 5/17] RUN pip install 'apache-airflow-providers-arangodb==2.1.1rc1' 8.6s
=> [ 6/17] RUN pip install 'apache-airflow-providers-cncf-kubernetes==5.2.0rc1' 11.4s
=> [ 7/17] RUN pip install 'apache-airflow-providers-dbt-cloud==3.0.0rc1' 8.7s
=> [ 8/17] RUN pip install 'apache-airflow-providers-elasticsearch==4.4.0rc1' 9.1s
=> [ 9/17] RUN pip install 'apache-airflow-providers-ftp==3.3.1rc1' 7.3s
=> [10/17] RUN pip install 'apache-airflow-providers-google==8.9.0rc1' 16.2s
=> [11/17] RUN pip install 'apache-airflow-providers-microsoft-azure==5.2.0rc1' 9.9s
=> [12/17] RUN pip install 'apache-airflow-providers-mysql==4.0.1rc1' 9.4s
=> [13/17] RUN pip install 'apache-airflow-providers-presto==4.2.2rc1' 9.1s
=> [14/17] RUN pip install 'apache-airflow-providers-sftp==4.2.2rc1' 9.4s
=> [15/17] RUN pip install 'apache-airflow-providers-snowflake==4.0.3rc1' 9.0s
=> [16/17] RUN pip install 'apache-airflow-providers-tableau==4.1.0rc1' 8.8s
=> [17/17] RUN pip install 'apache-airflow-providers-trino==4.3.2rc1' 8.8s
=> exporting to image 0.7s
=> => exporting layers 0.6s
=> => writing image sha256:d549269c2ffa35f57fb54b8eef371bcbd5d60e98b5ca39b5eaed171c81674bd1 0.0s
=> => naming to docker.io/library/pmc_ci_image
```
#### Checking the version (I see the incorrect version)
```
$ docker run --rm --entrypoint airflow pmc_ci_image info | grep amazon
apache-airflow-providers-amazon | 7.1.0
```
#### Checking the versions with the image based on `apache:airflow/latest` (I see the correct RC version)
```
$ docker run --rm pmc airflow info | grep amazon
apache-airflow-providers-amazon | 7.2.0rc1
```
Any idea what's going on @potiuk?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] o-nikolas commented on pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile
Posted by "o-nikolas (via GitHub)" <gi...@apache.org>.
o-nikolas commented on PR #29469:
URL: https://github.com/apache/airflow/pull/29469#issuecomment-1452742994
@potiuk I think this one is now ready for another round of review
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #29469: Updates to provider release verification documentation and check_files to produce a more useful Dockerfile
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #29469:
URL: https://github.com/apache/airflow/pull/29469#discussion_r1103228230
##########
dev/README_RELEASE_PROVIDER_PACKAGES.md:
##########
@@ -561,14 +561,41 @@ Or update it if you already checked it out:
svn update .
```
-Optionally you can use `check_files.py` script to verify that all expected files are
-present in SVN. This script may help also with verifying installation of the packages.
+Optionally you can use the [`check_files.py`](https://github.com/apache/airflow/blob/main/dev/check_files.py)
+script to verify that all expected files are present in SVN. This script will produce a `Dockerfile.pmc` which
+may help with verifying installation of the packages.
```shell script
# Copy the list of packages (pypi urls) into `packages.txt` then run:
python check_files.py providers -p {PATH_TO_SVN}
```
+After the above script completes you can build `Dockerfile.pmc` to trigger an installation of each provider
+package:
+
+```shell script
+docker build - < Dockerfile.pmc
+```
+
+**Note**: This may fail to install some providers. For example, if they require some system level dependencies
+that aren't present in the image. If you wish to investigate you may update the Dockerfile to install any
+missing dependencies and then try to build again.
+
+For example, currently `apache-airflow-providers-apache-hive` requires the `libsasl2` system dependency. To
Review Comment:
If we replace `FROM apache/airflow:latest` with `FROM ghcr.io/apache/airflow/main/ci/python3.10` or smth - that might be not needed any more. The CI image has all that is needed.
Also I've been thinking about releasing "fat" airflow image - as opposed to "slim" (go figure). Several users would have less problems if we had one. This might be the right long-term solution.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org