You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/08 11:18:50 UTC

[GitHub] [airflow] potiuk opened a new issue #8785: Move out all unnecessary binaries installed in CI image

potiuk opened a new issue #8785:
URL: https://github.com/apache/airflow/issues/8785


   **Description**
   
   We have a lot of unnecessary binaries/libraries baked into the CI image. And we can move it out in the form of integration images downloaded separately by docker-compose. 
   
   Those are:
   
   - [ ] Kind - not needed as long as we move it out of the container #8782
   - [ ] KubeCTL - not needed as long as we move kind out of the container #8782 
   - [ ] Hadoop Distro - could be a separate docker -run Hadoop most likely #8783
   - [ ] Minicluster - could be a separate docker run minicluster mostl likely #8784
   - [ ] Singularity - again could be a separate docker run integration #8774
   
   **Use case / motivation**
   
   <!-- What do you want to happen?
   
   Rather than telling us how you might implement this solution, try to take a
   step back and describe what you are trying to achieve.
   
   -->
   
   **Related Issues**
   
    #8782, #8782,, #8783, #8784, #8774 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-631957581


   /cc @jhtimmins 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-632003742


   We can get those also used from docker images. 
   
   We have now the docker socket mapped inside the docker container for breeze so we can do `docker pull google/cloud-sdk:latest` - we can even write simple scripts that can do it seamlessly the first time you run 'gcloud' command .
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-632004132


   That's basicallly what I planned to do with all the binaries -> but it requires a bit of work. If anyone can pick this up - I am more than happy to review :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-638989154


   Closed by #9129
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-631970084


   @ashb On CI, we don't use it yet. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-631960980


   @mik-laj That's why I asked about CI, not system tests.
   
   A large part of the slow down for our tests was pulling images, so if we can reduce the size of that image we can speed our builds up.
   
   Trade off is to decide if maintaining a second image is worth it or not.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-631959453


   @ashb we use `gcloud` and `aws-cli` in system tests. We don't always have an operator. gcloud is even used by the Airflow operators. 
   https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/kubernetes_engine.py#L267
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-632003742


   We can get those also used from docker images. 
   
   We have now the docker socket mapped inside the docker image so we can do `docker pull google/cloud-sdk:latest` - we can even write simple scripts that can do it seamlessly the first time you run 'gcloud' command .
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-638444079


   I think docker can be removed as well. Let me see.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-631957488


   @potiuk Do our _CI_ tests need `gcloud` or `aws` CLI tools? Between them the add something like 500Mb to the image.
   
   (It would be _really_ nice if docker let you easily compose layers from different images, without having to rebuild it)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-631970936


   Might be worth investigating, at least once we've gotten rid of all the items on this list which are far bigger.
   
   https://github.com/wagoodman/dive is a useful too for exploring docker images


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-638443708


   Kubectl gone already after merging #8265 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-638221700


   Next chunkiest layers (in order they appear in the Dockerfile):
   
   - 413MB - Installing node and yarn
   - 385MB - docker-ce
   - 43MB kubectl
   - 438MB - gcloud-sdk (aws CLI is only 72MB)
   - 1.4GB - all of the python deps (`pip install -e .[$AIRFLOW_EXTRAS]` - not much we can do about that I don't think)
   - 219MB - node deps (`yarn --cwd airflow/www install --frozen-lockfile --no-cache`)
   
   (Not saying we can/should get rid of all of those, just listing large layers)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #8785:
URL: https://github.com/apache/airflow/issues/8785


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #8785: Move out all unnecessary binaries installed in CI image

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8785:
URL: https://github.com/apache/airflow/issues/8785#issuecomment-632008036


   @ashb @mik-laj -> issues added for gcloud and awscli. Feel free to implement  if you have time


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org