You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/02 17:00:21 UTC

[GitHub] [airflow] potiuk opened a new pull request #15162: Fail in case wrong UID or GID is used for prod image

potiuk opened a new pull request #15162:
URL: https://github.com/apache/airflow/pull/15162


   The PROD image of airflow is OpenShift compatible and it can be
   run with either 'airflow' user (UID=50000) or with any other
   user with (GID=0).
   
   Fixes: #15107
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812641980


   BTW. I 'd love to cherry-pick it before we prepare 2.0.2 (because more users have questions about it recently)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#discussion_r606665326



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -244,6 +243,47 @@ function exec_to_bash_or_python_command_if_specified() {
     fi
 }
 
+function check_uid_gid() {
+    if [[ $(id -g) == "0" ]]; then
+        return
+    fi
+    if [[ $(id -u) == "50000" ]]; then
+        >&2 echo

Review comment:
       I turned it into warning only if 'airflow' user is used and GID is not 0. This should be quite nice for the users:
   
   * existing helm charts will work
   * it's rather OK to run airflow with any GID as long as you are never going to use the arbitrary UID for already created directories
   * we clearly warn the users and they will see it in their log files that GID=0 is the "recommended" way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812862890


   I thought about it a bit more and added a bit more clearer description (and example) on why we are doing all that. And I closed some of the loopholes (for example I realized that I removed 50000 GID from airflow rather than made it supplementary group).
   
   I think with those explanations and examples and the entrypoint script redirecting the user to that documentation, we should not have any more issues/questions from our users. Looking forward to getting it reviewed quickly.
   
   
   @ashb @kaxil  - when are we cutting 2.0.2? I would really love to get that one in. This will guide the users better, when trying to build their own images based on Airflow's reference one. 
   
   I think it closes quite a number of potential mistakes of the users trying to build their own images


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812677042


   > I read that guide in openshift and come to the complete opposite conclusion: we should support running as whatever user id the user chooses.
   
   Yeah. I actually fully agree with that statement. And we can do that already (our image allows that),
   
   But the gudelines also say (and I follow it) that in order to make it possible without additional extra effort, the best way is to also always run the image with GID=0 and make your image prepared for that - every folder in the image to have GID=0 as primary group and all the access rights synchronized so that rights of the owner and rights of the group are the same.
   
   Following this avoids many pitfalls when you think about extendability of the image . For example if you build your image FROM: and want to install new package, if you will do it as an `airflow` user (because  you do not want to change it back to root and again back to airflow) with the current approach, it will still be "OpenShift-compatible" and work without any more effort.
   
   Which actually made me check something - I am going to update this one a bit and add `umask 0002` instead of the default `0022` which will make it truly working out-of-the box for arbitrary user even if the installation of such 3-rd party package will create a new writeable directory.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812643417


   @ashb  - do you have any concerns about this ? It's pretty standard problem with any docker images out there, how to make them run with arbitrary user ids and there are couple of solutions (like changing directory permissions dynamically when entering the image or making them rxw to "other". The openshift solution is elegant and simple and secure (no root user used to run anything in the image), also with recent changes where 'root' group is the default and --local flag in PIP is default as well, we have a complete solution that works out-of-the-box in most scenarios - this entrypoint warning is just a way to make the users aware about the limitations. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812639567


   I just pushed a bit better error message returned in this case, so that it is much clearer. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812667736


   I read that guide in openshift and come to the complete opposite conclusion: we should support running as whatever user id the user chooses.
   
   I don't have a problem with this change though, just wanted to know what the problem/exact error was since the linked issue didn't mention.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-815911399


   Hi @potiuk, Ash is on vacation this week, I think this looks good though!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#discussion_r606667513



##########
File path: docs/docker-stack/entrypoint.rst
##########
@@ -87,13 +87,49 @@ The image entrypoint works as follows:
   command to execute and result of this evaluation is used as ``AIRFLOW__CELERY__BROKER_URL``. The
   ``_CMD`` variable takes precedence over the ``AIRFLOW__CELERY__BROKER_URL`` variable.
 
-Creating system user
---------------------
+.. _arbitrary-docker-user:
+
+Allowing arbitrary user to run the container
+--------------------------------------------
+
+Airflow image is Open-Shift compatible, which means that you can start it with random user ID and the
+group id ``0`` (``root``). If you want to run the image with user different than Airflow, you MUST set
+GID of the user to ``0``. In case you try to use different group, the entrypoint exits with error.
+
+In order to accommodate a number of external libraries and projects, Airflow will automatically create
+such an arbitrary user in (`/etc/passwd`) and make it's home directory point to ``/home/airflow``.

Review comment:
       ```suggestion
   such an arbitrary user in (``/etc/passwd``) and make it's home directory point to ``/home/airflow``.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#discussion_r606376745



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -244,6 +243,30 @@ function exec_to_bash_or_python_command_if_specified() {
     fi
 }
 
+function check_uid_gid() {
+    if [[ $(id -u) == "50000" ]]; then
+        return
+    fi
+    if [[ $(id -g) == "0" ]]; then
+        return
+    fi
+    >&2 echo
+    >&2 echo "ERROR! you should run the image with UID=50000 or GID=0"
+    >&2 echo
+    >&2 echo "The image can be run in two ways when it comes to UID:"
+    >&2 echo "  * with 'airflow' user (UID=50000) and any GID"
+    >&2 echo "  * with any user and GID=0 (root)"
+    >&2 echo
+    >&2 echo " You start the image with UID=$(id -u) and GID=$(id -g)"
+    >&2 echo
+    >&2 echo " This is to learn from the guidelines of OpenShift and make the image OpenShift compatible"
+    >&2 echo " See more about it in the docker image documentation on http://airflow.apache.org/docs/"
+    >&2 echo " or https://docs.openshift.com/container-platform/4.7/openshift_images/create-images.html#images-create-guide-openshift_create-images"

Review comment:
       I don't think linking to a doc about building an image on openshift is relevant for users running this image.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #15162:
URL: https://github.com/apache/airflow/pull/15162


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812643417


   @ash - do you have any concerns about this ? It's pretty standard problem with any docker images out there, how to make them run with arbitrary user ids and there are couple of solutions (like changing directory permissions dynamically when entering the image or making them rxw to "other". The openshift solution is elegant and simple and secure (no root user used to run anything in the image), also with recent changes where 'root' group is the default and --local flag in PIP is default as well, we have a complete solution that works out-of-the-box in most scenarios - this entrypoint warning is just a way to make the users aware about the limitations. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812643417


   @ash - do you have any concerns about this ? It's pretty standard problem with any docker images out there, how to make them run with arbitrary user ids and there are couple of solution (like changing directory permissions dynamically when entering the image or making them rxw to "other". The openshift solution is elegant and simple and secure (no root user used to run anything in the image), also with recent changes where 'root' group is the default and --local flag in PIP is default as well, we have a complete solution that works out-of-the-box in most scenarios - this entrypoint warning is just a way to make the users aware about the limitations. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812873098


   [The Workflow run](https://github.com/apache/airflow/actions/runs/714442116) is cancelling this PR. Building image for the PR has been cancelled


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-816005856


   Thanks @dimberman !
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-814372210


   Hey @ashb / @kaxil/ @mik-laj ->  any more comments here for the UID feature? I think I described it pretty well now and we have warning rather than error where airflow is used with the original UID. Seems like this one might save us some issues from users. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812643417


   @ash - do you have any concerns about this ? It's pretty standard problem with any docker images out there, how to make them run with arbitrary user ids and there are couple of solution (like changing directory permissions dynamically when entering the image or making them rxw to "other". The openshift solution is elegant and simple, also with recent changes where 'root' group is the default and --local flag in PIP is default as well, we have a complete solution that works out-of-the-box in most scenarios - this entrypoint warning is just a way to make the users aware about the limitations. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-815402975


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812681557


   I added the umask and changed the description/subject of the commit to better describe the meaning of this now. Also I added a comment in the entrypoint explaining why umask is added, and linked to the new, updated documentation (already cherry-picked to 2.0.2 so it should be linking to the right part of the documentation when we release it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812639154


   It's about access right to a number of directories. This is very common pattern (for example used in OpenShift) and we already have a number of issues opened about it (for example just opened issue here which describes what happens and where the problems are: https://github.com/apache/airflow/issues/15161#issuecomment-812606687) 
   
   This is also very closely following the openshift guidelines: 
   
   https://docs.openshift.com/container-platform/4.7/openshift_images/create-images.html#images-create-guide-openshift_create-images
   
   All the reasons for this approach are described there in detail in th "support for arbitrary user ids" chapter in the openshift documentation - I will not copy it here, they described it pretty well there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-815117601


   Anyone :) ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812633287


   What fails of the you uid is incorrect?
   
   So long as the dags can be read, and the logs folder written to the image will work I think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15162: Better compatibility/diagnostics for arbitrary UID in docker image

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#issuecomment-812878501


   [The Workflow run](https://github.com/apache/airflow/actions/runs/714561502) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15162: Fail in case wrong UID or GID is used for prod image

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15162:
URL: https://github.com/apache/airflow/pull/15162#discussion_r606385727



##########
File path: scripts/in_container/prod/entrypoint_prod.sh
##########
@@ -244,6 +243,30 @@ function exec_to_bash_or_python_command_if_specified() {
     fi
 }
 
+function check_uid_gid() {
+    if [[ $(id -u) == "50000" ]]; then
+        return
+    fi
+    if [[ $(id -g) == "0" ]]; then
+        return
+    fi
+    >&2 echo
+    >&2 echo "ERROR! you should run the image with UID=50000 or GID=0"
+    >&2 echo
+    >&2 echo "The image can be run in two ways when it comes to UID:"
+    >&2 echo "  * with 'airflow' user (UID=50000) and any GID"
+    >&2 echo "  * with any user and GID=0 (root)"
+    >&2 echo
+    >&2 echo " You start the image with UID=$(id -u) and GID=$(id -g)"
+    >&2 echo
+    >&2 echo " This is to learn from the guidelines of OpenShift and make the image OpenShift compatible"
+    >&2 echo " See more about it in the docker image documentation on http://airflow.apache.org/docs/"
+    >&2 echo " or https://docs.openshift.com/container-platform/4.7/openshift_images/create-images.html#images-create-guide-openshift_create-images"

Review comment:
       Agree. We are going to have a new "docker" area in teh 2.0.2 image. I am going to point to that - in the updated documentation, I already explained it in more detailed and linked to openshift guidelines.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org