You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/08 15:19:47 UTC

[GitHub] [airflow] Rajpratik71 opened a new pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Rajpratik71 opened a new pull request #11352:
URL: https://github.com/apache/airflow/pull/11352


   using "--no-cache-dir" flag in pip install ,make sure downloaded packages
   by pip don't cached on system . This is a best practice which make sure
   to fetch from repo instead of using local cached one . Further , in case
   of Docker Containers , by restricting caching , we can reduce image size.
   In term of stats , it depends upon the number of python packages
   multiplied by their respective size . e.g for heavy packages with a lot
   of dependencies it reduce a lot by don't caching pip packages.
   
   Further , more detail information can be found at
   
   https://medium.com/sciforce/strategies-of-docker-images-optimization-2ca9cc5719b6
   
   Signed-off-by: Pratik Raj <ra...@gmail.com>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705641828


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better πŸš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Rajpratik71 closed pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
Rajpratik71 closed pull request #11352:
URL: https://github.com/apache/airflow/pull/11352


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Rajpratik71 closed pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
Rajpratik71 closed pull request #11352:
URL: https://github.com/apache/airflow/pull/11352


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705725610


   
   > On examining i noticed that it is multi stage docker build , with a build image and the main image . All the dependencies are getting installed in builder image , then there is no need of this as after build main image is used and pushed .
   
   @Rajpratik71 . Exactly. That is not a good idea. We have multi-segmented build and the "pip install" step is done in the "build" segment. Then only installed Python libraries from "${HOME}/.local" are copied to the final image using COPY --from. It's actually even better to leave pip --cache because then it causes much faster rebuilds of the image. 
   
   In the build segment we run the pip install twice - the first time to run the "current master" dependencies and then, when we build the image, with the actual dependencies from sources. This way we get faster rebuilds when setup.py changes, we do not have to re-install everything from scratch when we iterate on the image (for example when we are running kubernetes tests). So removing cache in this case is not a good idea at all.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Rajpratik71 commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
Rajpratik71 commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705664642


   > > > That's a good idea actually
   > > 
   > > 
   > > It's what we use so we don't have to worry about doing adding that flag in all our dependant images πŸ™‚
   > 
   > :) @Rajpratik71 Can you update the PR to use the env var instead
   
   On examining i noticed that it is multi stage docker build , with a build image and the main image . All the dependencies are getting installed in builder image , then there is no need of this as after build main image is used and pushed .
   
   So , there is no need of this PR.
   
   Hence , closing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705651083


   > Perhaps it might be better to add the no cache dir environment variable, so all pip instructions automatically refrain from using the cache without having to explicitly add it to each call?
   > 
   > ```
   > ENV PIP_NO_CACHE_DIR="1"
   > ```
   
   That's a good idea actually 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] madison-ookla commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
madison-ookla commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705650401


   Perhaps it might be better to add the no cache dir environment variable, so all pip instructions automatically refrain from using the cache without having to explicitly add it to each call?
   ```
   ENV PIP_NO_CACHE_DIR="1"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] madison-ookla commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
madison-ookla commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705650401






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] madison-ookla commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
madison-ookla commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705653919


   > That's a good idea actually
   
   It's what we use so we don't have to worry about doing adding that flag in all our dependant images πŸ™‚ 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705641828


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better πŸš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705651083






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705655308


   > > That's a good idea actually
   > 
   > It's what we use so we don't have to worry about doing adding that flag in all our dependant images πŸ™‚
   
   :) @Rajpratik71 Can you update the PR to use the env var instead


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Rajpratik71 commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
Rajpratik71 commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705668190


   > Perhaps it might be better to add the no cache dir environment variable, so all pip instructions automatically refrain from using the cache without having to explicitly add it to each call?
   > 
   > ```
   > ENV PIP_NO_CACHE_DIR="1"
   > ```
   
   For , this in old versions of pip has conflicts, which gives error mentioned at [pypa/pip/issues/5385](https://github.com/pypa/pip/issues/5385) and [pypa/pip/issues/5735](https://github.com/pypa/pip/issues/5735). 
   
   It is fixed in latest versions [at](https://pip.pypa.io/en/stable/news/#id333)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705725610


   
   > On examining i noticed that it is multi stage docker build , with a build image and the main image . All the dependencies are getting installed in builder image , then there is no need of this as after build main image is used and pushed .
   
   @Rajpratik71 . Exactly. That is not a good idea. We have multi-segmented build and the "pip install" step is done in the "build" segment. Then only installed Python libraries from "${HOME}/.local" are copied to the final image using COPY --from. It's actually even better to leave pip --cache because then it causes much faster rebuilds of the image. 
   
   In the build segment we run the pip install twice - the first time to run the "current master" dependencies and then, when we build the image, with the actual dependencies from sources. This way we get faster rebuilds when setup.py changes, we do not have to re-install everything from scratch when we iterate on the image (for example when we are running kubernetes tests). So removing cache in this case is not a good idea at all.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Rajpratik71 commented on pull request #11352: chore: use `--no-cache-dir` flag to `pip` in dockerfiles, to save space

Posted by GitBox <gi...@apache.org>.
Rajpratik71 commented on pull request #11352:
URL: https://github.com/apache/airflow/pull/11352#issuecomment-705664642






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org