You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/04 17:41:57 UTC

[GitHub] [airflow] potiuk opened a new pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

potiuk opened a new pull request #15187:
URL: https://github.com/apache/airflow/pull/15187


   While checking the test status of various CI tests we came to
   conclusion that Presto integration took a lot of memory (~1GB)
   and was the main source of failures during integration tests,
   especially with MySQL8. The attempt to fine-tune the memory
   used turned out in the discovery, that Presto DB stopped
   publishing their Docker image (prestosql/presto) - apparently
   after the aftermath of splitting-off Trino from Presto.
   
   Th split-off was already discussed in #14281 and it was planned
   to add support for Trino (which is the more community-driven
   fork of the Presto - Presto remained at Facebook Governance,
   where Trino is an effort continued by the original creators.
   
   You can read more about it in the announcement:
   https://trino.io/blog/2020/12/27/announcing-trino.html. While
   Presto continues their way under The Linux Foundation, Trino
   lives its own live and keeps on maintaining all artifacts and
   libraries (including the image). That allowed us to update
   our tests and decrease the memory footprint by around 400MB.
   
   This commit:
   
   * adds the new Trino provider
   * removes `presto` integration and replaces it with `trino`
   * the `trino` integartion image is built with 400MB less memory
     requirementes and published as `apache/airflow:trino-*`
   * moves the integration tests from Presto to Trino
   
   Fixes: #14281
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813529506


   [The Workflow run](https://github.com/apache/airflow/actions/runs/719943221) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dungdm93 commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
dungdm93 commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r606895250



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       I'd like to recommend use official trino image and mount `entrypoint.sh` at runtime instead of build custom image:
   ```yaml
     trino:
       image: trinodb/trino:354
       entrypoint: ["/usr/local/bin/entrypoint.sh"]
       command: ["/usr/lib/trino/bin/run-trino"]
       volumes:
       - ../dockerfiles/trino/entrypoint.sh:/usr/local/bin/entrypoint.sh:ro
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813457835


   > It looks like I have everything right ... but it fails ...
   
   It is true. You have the correct code, but the cache contains outdated information.  To prevent such problems, we build documentation twice in some situations (see: https://github.com/apache/airflow/pull/12527#issuecomment-731568418 and .https://github.com/apache/airflow/blob/4099108f554130cf3f87ba33b9d6084a74e70231/docs/build_docs.py#L545-L563)  Unfortunately, here is another problem.  We have a smart code preview in errors summary, which cannot display the code if it is automatically generated (see:: https://github.com/apache/airflow/blob/ef4af21/docs/apache-airflow-providers/packages-ref.rst).
   
   I pushed fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813473087


   > It is true. You have the correct code, but the cache contains outdated information. To prevent such problems, we build documentation twice in some situations (see: [#12527 (comment)](https://github.com/apache/airflow/pull/12527#issuecomment-731568418) and .
   
   Completely forgot about it :). Thanks! Now I have to deal with numpy/pandask/dask dependencies that dropped support for python 3.6 (and some new versions were released today)  but I will deal with in a separate PR shortly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607024725



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       Thanks @dungdm93 for the comment. We had similar discussions in the past, and I took the position that we should have copies of all the CI images in airflow repo. And now it's not only my strong opinion but we have a proof that it was a good decision.
   
   First of all - Beware, this is a long answer :). This is only because I try to explain and write it down  also for myself - I am just about to start a similar project in my daily work so I use that opportunity to make my reasoning clear and have some good arguments written down and thoughts wrapped around it. 
   
   Also @kaxil @ashb @dimberman @turbaszek @mik-laj -> I think it might be good you read that, that might be an interesting study for the future of either of the projects/companies we work together on :).
   
   Why do we want to have our ow images for 3rd-party non-official images ? 
   
   The Presto/Trino situation had shown exactly why my decision to copy the images was good. One day out-of-the-sudden, the original `prestosql/prestodb` image disappeared. This already happened - we witnessed it. We do not know when it happened - it was some time between Jan and April 2021. But it did disappear one day.
   
   But for us ... it did not matter. We were completely unaffected by it. Our CI continued to work. No disruption. We even added features to presto provider without any need to change the CI process at all. This is the effect. And I have foreseen it when I migrated all the CI images which were not "official" images to use the ones that are in `apache/airflow` registry.
   
   Usually the CI system is maintained by a handful of people and used by many and if it breaks for whatever reason, everyone is affected but only a handful of people can fix it. Everyone wants to use the CI process, but very, very, very few has the time, skills and will to understand how it works. And when it breaks - it's like literally everyone around expects those people who maintain the CI to fix it. No matter what language it is, no matter how complex it is - people expect it to work and when it does not, they will rather escalate the problem rather than try to understand the root cause and fix it. 
   
   Note, that this is not a complaint - not at all. This is just a statement of fact. I've seen it in every single place I've been - no matter if it was Jenkins, Travis, CircleCI, GitHub Actions, Bamboo,  custom CI, you name it. No matter if it was a couple of scripts in bash, python-based build system, just a bunch of custom "yaml" ci-specific tags put together. It does not matter. The developers expect the CI system to work and very, very, very rarely they will spend any time on fixing any problems there - because this is not their area of expertise and they do not interact with it on a "daily" basis. No matter how hard you try - for 20 years I've seen this happening (and I've seen it happening as creator of such systems, user, manager of projects using those - you name it). So it's not just `Jarek's complex CI systems`. It's `all CI Systems out there`. I'ts just a fact of life.
   
   In effect, this means that if there is a 3rd-party who can break the CI, those handful of people can be put in 'emergency' mode by a 3rd-party decision backed by all the users of CI who will 'complain' and expect the fix to happen. Not good. So for me, the stability of the CI  and it's independence from 3rd parties trumps `easiness of changes` and often even it trumps `simplicity`.  Big time.
   
   So after this long reasoning - in case of  our own copy of images - imagine what happens if we did not have our own image. Out of  sudden, one day without any warning, all the PRs of all the people to airflow would start to fail the moment the image was removed.  And we would have to scramble to find a solution. 
   
   When this happened, the Trino image was likely not yet available as quick fix or mature enough. And we would likely have to reverse engineer Dockerfile commands from the original image, build our own, deal with potential incompatibilities and missing configuration and entrypoint scripts that we would have to extract from a copy of the image (if we were lucky enough to find somewhere an existing image). And it took quite a few hours (and this PR) to switch to Trino imge. 
   
   Or maybe we would have to dig through git history of the prestodb to find out where are the scripts and how they built the image - hoping that it was not a manual process, that it is well documented and that it is actually reproducible. And all this in a big hurry, because of a number of people complaining that our integration tests are broken. Because of those complaints, likely that would mean some mitigation and updates to our CI process to exclude presto from our CI build which will mean that even more time is needed in "emergency" mode from those handful of people.. And often those things are happening while those handful of people are in the middle of important work project that you have to deliver, or they have difficult (or happy) family situation to handle, and they have no extra time / cycles to suddenly fix something that was caused by a 3rd-party's decision to pull the image.
   
   I do not want those things to happen. If I can avoid it, and have the CI under my project's full control, I'd do it. I do not want to depend on 3rd-party decisions to impact our CI stability. Too many people depend on it and we have already too many reasons for instability - and continuously fighting to get it under control. 
   
   Yes - we have to rely on something as a base and for example I can rely on 'debian:buster-slim', for example or `python` images. They are both "official" images by DockerHub (https://docs.docker.com/docker-hub/official_images/), they are maintained by DockerHub people (on top of the original maintainers) and you are sure there will be there when needed (barring temporary problems like those we experienced over the last 48 hours - but this was temporary incident). They come with SLAs and nobody will just `pull the plug` as it happened with presto.
   
   Last [point - when it comes to understanding/building the image - we are using standard FROM: build and I think It's super easy to build the image - simply run `./build_and_push.sh`  (it fails on push if you have no rights to push it). That's it. No more no less. I used it when I iterated with memory parameters and while "slightly" annoying it is actually very simple and fast workflow (building and even pushing the image incrementally, when you just changed `entrypoint` takes seconds as only last layer is affected). And this image and the entrypoint are not going to change more frequently than every few months (last change to presto image building was done at 8th of Jan). There is no reason other than regular refactor/maintenance and the need to upgrade the version of base image to modify the entrypoint once we have the memory under control. So no real need to support the 'mounted script' case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813566818


   > ) Unfortunately, here is another problem. We have a smart code preview in errors summary, which cannot display the code if it is automatically generated (see:: https://github.com/apache/airflow/blob/ef4af21/docs/apache-airflow-providers/packages-ref.rst).
   
   It has not helped @mik-laj :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813088269


   [The Workflow run](https://github.com/apache/airflow/actions/runs/717227521) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813532850


   @kaxil - I added that one as 2.0.2 as well - even if it is not needed for provider's release but it will be nice to get `trino` extra in 2.0.2. I do not think it qualifies as 'new feature' :). We discussed before tha tdependency changes are not "new-features". This one however is a bit edge-case :D.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dungdm93 commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
dungdm93 commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r606892272



##########
File path: docs/apache-airflow-providers-trino/commits.rst
##########
@@ -0,0 +1,26 @@
+
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Package apache-airflow-providers-trino
+------------------------------------------------------
+
+`Presto <https://trino.io/>`__

Review comment:
       ```suggestion
   `Trino <https://trino.io/>`__
   ```

##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       I'd like to recommend use official trino image and mount `entrypoint.sh` at runtime instead of build custom image:
   ```yaml
     trino:
       image: trinodb/trino:353
       entrypoint: ["/usr/local/bin/entrypoint.sh"]
       command: ["/usr/lib/trino/bin/run-trino"]
       volumes:
       - ../dockerfiles/trino//entrypoint.sh:/usr/local/bin/entrypoint.sh:ro
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813100801


   [The Workflow run](https://github.com/apache/airflow/actions/runs/717414629) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607024725



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       Thanks @dungdm93 for the comment. We had similar discussions in the past, and I took the position that we should have copies of all the CI images in airflow repo. And now it's not only my strong opinion but we have a proof that it was a good decision.
   
   First of all - Beware, this is a long answer :). This is only because I try to explain and write it down  also for myself - I am just about to start a similar project in my daily work so I use that opportunity to make my reasoning clear and have some good arguments written down and thoughts wrapped around it. 
   
   Also @kaxil @ashb @dimberman @turbaszek @mik-laj -> I think it might be good you read that, that might be an interesting study for the future of either of the companies we work together on :).
   
   Why do we want to have our ow images for 3rd-party non-official images ? 
   
   The presto/trino situation had shown exactly why my decision to copy the images was good. One day out-of-the-sudden, the original `prestosql/prestodb` image disappeared. This already happened - we witnessed it.
   
   But for us ... it did not matter. We were completely unaffected by it. Our CI continued to work. No disruption. We even added features to presto provider without any need to change the CI process at all. This is the effect. And I have foreseen it when I migrated all the CI images which were not "official" images to use the ones that are in apache/airflow registry.
   
   Usually the CI system is maintained by a handful of people and used by many and if it breaks for whatever reason, everyone is affected but only a handful of people can fix it. Everyone wants to use the CI process, but very, very, very few has the time, skills and will to understand how it works. And when it breaks - it's like literally everyone around expects those people who maintain the CI to fix it. No matter what language it is, no matter how complex it is - people expect it to work and when it does not, they will rather escalate the problem rather than try to understand the root cause and fix it. 
   
   Note, that this is not a complaint - not at all. This is just a statement of fact. I've seen it in every single place I've been - no matter if it was Jenkins, Travis, CircleCI, GitHub Actions, Bamboo,  custom CI, you name it. No matter if it was a couple of scripts in bash, python-based build system, just a bunch of custom "yaml" ci-specific tags put together. It does not matter. The developers expect the CI system to work and very, very, very rarely they will spend any time on fixing any problems there - because this is not their area of expertise and they do not interact with it on a "daily" basis. No matter how hard you try - for 20 years I've seen this happening (and I've seen it happening as creator of such systems, user, manager of projects using those - you name it). So it's not just `Jarek's complex CI systems`. It's `all CI Systems out there`. I'ts just a fact of life.
   
   In effect, this means that if there is a 3rd-party who can break the CI, those handful of people can be put in 'emergency' mode by a 3rd-party decision backed by all the users of CI who will 'compllain' and expect the fix to happen. Not good. So for me, the stability of the CI  and it's independence from 3rd parties trumps `easiness of changes` and often even it trumps `simplicity`.  Big time.
   
   So after this long reasoning - in case of  our own copy of images - imagine what happens if we did not have our own image. Out of  sudden, one day without any warning, all the PRs of all the people to airflow would start to fail the moment the image was removed.  And we would have to scramble to find a solution. When this happened, the trino image was likely not yet available as quick fix or mature enough. And we would likely have to reverse engineer dockerfile commands from the original image, build our own, deal with potential incompatibilities and missing configuration and entrypoint scripts that we would have to extract from a copy of the image (if we were lucky enough to find somewhere an existing image). Or  maybe we would have to dig through git history of the prestodbsql to find out where are the scripts and how they built the image - hoping that it was not a manual process, that it is well documented and that it is actually reproducible. And all this in a big hurry, becau
 se of a number of people complaining that our integration tests are broken, so likely that would mean some mitigations and updates to our CI process to exclude presto from our CI build. And often those things are happening while you are in the middle of important work project that you have to deliver and you have no extra time to get to suddenly fix something that was caused by a 3rd-party's decision to pull the image.
   
   I do not want those things to happen. If I can avoid it, and have the CI under my project's full control, I'd do it. I do not want to depend on 3rd-party decisions to impact our CI stability. Too many people depend on it and we have already too many reasons for instability - and continuously fighting to get it under control. 
   
   Yes - we have to rely on something as a base and for example I can rely on 'debian:buster-slim', for example or `python` images. They are both "official" images by DockerHub (https://docs.docker.com/docker-hub/official_images/), they are maintained by DockerHub people (on top of the original maintainers) and you are sure there will be there when needed (barring temporary problems like those we experienced over the last 48 hours - but this was temporary incident). They come with SLAs and nobody will just `pull the plug` as it happened with presto.
   
   Last [point - when it comes to understanding/building the image - we are using standard FROM: build and I think It's super easy to build the image - simply run `./build_and_push.sh`  (it fails on push if you have no rights to push it). That's it. No more no less. I used it when I iterated with memory parameters and while "slightly" annoying it is actually very simple and fast workflow (building and even pushing the image incrementally, when you just changed `entrypoint` takes seconds as only last layer is affected). And this image and the entrypoint are not going to change more frequently than every few months (last change to presto image building was done at 8th of Jab). There is no reason other than regular refactor/maintenance and the need to upgrade the version of base image to modify the entrypoint once we have the memory under control. So no real need to support the 'mounted script' case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r606836896



##########
File path: airflow/providers/qubole/operators/qubole.py
##########
@@ -87,6 +87,10 @@ class QuboleOperator(BaseOperator):
             :query: inline query statement
             :script_location: s3 location containing query statement
             :macros: macro values which were used in query
+        prestocmd:
+            :query: inline query statement
+            :script_location: s3 location containing query statement
+            :macros: macro values which were used in query

Review comment:
       Correctl I was adding it and chcked there is no such command (you should still use presto) but then forgot to remove it :) Good Eye!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607024725



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       Thanks @dungdm93 for the comment. We had similar discussions in the past, and I took the position that we should have copies of all the CI images in airflow repo. And now it's not only my strong opinion but we have a proof that it was a good decision.
   
   First of all - Beware, this is a long answer :). This is only because I try to explain and write it down  also for myself - I am just about to start a similar project in my daily work so I use that opportunity to make my reasoning clear and have some good arguments written down and thoughts wrapped around it. 
   
   Also @kaxil @ashb @dimberman @turbaszek @mik-laj -> I think it might be good you read that, that might be an interesting study for the future of either of the companies we work together on :).
   
   Why do we want to have our ow images for 3rd-party non-official images ? 
   
   The presto/trino situation had shown exactly why my decision to copy the images was good. One day out-of-the-sudden, the original `prestosql/prestodb` image disappeared. This already happened - we witnessed it.
   
   But for us ... it did not matter. We were completely unaffected by it. Our CI continued to work. No disruption. We even added features to presto provider without any need to change the CI process at all. This is the effect. And I have foreseen it when I migrated all the CI images which were not "official" images to use the ones that are in apache/airflow registry.
   
   Usually the CI system is maintained by a handful of people and used by many and if it breaks for whatever reason, everyone is affected but only a handful of people can fix it. Everyone wants to use the CI process, but very, very, very few has the time, skills and will to understand how it works. And when it breaks - it's like literally everyone around expects those people who maintain the CI to fix it. No matter what language it is, no matter how complex it is - people expect it to work and when it does not, they will rather escalate the problem rather than try to understand the root cause and fix it. 
   
   Note, that this is not a complaint - not at all. This is just a statement of fact. I've seen it in every single place I've been - no matter if it was Jenkins, Travis, CircleCI, GitHub Actions, Bamboo,  custom CI, you name it. No matter if it was a couple of scripts in bash, python-based build system, just a bunch of custom "yaml" ci-specific tags put together. It does not matter. The developers expect the CI system to work and very, very, very rarely they will spend any time on fixing any problems there - because this is not their area of expertise and they do not interact with it on a "daily" basis. No matter how hard you try - for 20 years I've seen this happening (and I've seen it happening as creator of such systems, user, manager of projects using those - you name it). So it's not just `Jarek's complex CI systems`. It's `all CI Systems out there`. I'ts just a fact of life.
   
   In effect, this means that if there is a 3rd-party who can break the CI, those handful of people can be put in 'emergency' mode by a 3rd-party decision backed by all the users of CI who will 'compllain' and expect the fix to happen. Not good. So for me, the stability of the CI  and it's independence from 3rd parties trumps `easiness of changes` and often even it trumps `simplicity`.  Big time.
   
   So after this long reasoning - in case of  our own copy of images - imagine what happens if we did not have our own image. Out of  sudden, one day without any warning, all the PRs of all the people to airflow would start to fail the moment the image was removed.  And we would have to scramble to find a solution. When this happened, the trino image was likely not yet available as quick fix or mature enough. And we would likely have to reverse engineer dockerfile commands from the original image, build our own, deal with potential incompatibilities and missing configuration and entrypoint scripts that we would have to extract from a copy of the image (if we were lucky enough to find somewhere an existing image). Or  maybe we would have to dig through git history of the prestodbsql to find out where are the scripts and how they built the image - hoping that it was not a manual process, that it is well documented and that it is actually reproducible. And all this in a big hurry, becau
 se of a number of people complaining that our integration tests are broken, so likely that would mean some mitigations and updates to our CI process to exclude presto from our CI build. And often those things are happening while you are in the middle of important work project that you have to deliver and you have no extra time to get to suddenly fix something that was caused by a 3rd-party's decision to pull the image.
   
   I do not want those things to happen. If I can avoid it, and have the CI under my project's full control, I'd do it. I do not want to depend on 3rd-party decisions to impact our CI stability. Too many people depend on it and we have already too many reasons for instability - and continuously fighting to get it under control. 
   
   Yes - we have to rely on something as a base and for example I can rely on 'debian:buster-slim', for example or `python` images. They are both "official" images by DockerHub (https://docs.docker.com/docker-hub/official_images/), they are maintained by DockerHub people (on top of the original maintainers) and you are sure there will be there when needed (barring temporary problems like those we experienced over the last 48 hours - but this was temporary incident). They come with SLAs and nobody will just `pull the plug` as it happened with presto.
   
   Last [point - when it comes to understanding/building the image - we are using standard FROM: build and I think It's super easy to build the image - simply run `./build_and_push.sh`  (it fails on push if you have no rights to push it). That's it. No more no less. I used it when I iterated with memory parameters and while "slightly" annoying it is actually very simple and fast workflow (building and even pushing the image incrementally, when you just changed `entrypoint` takes seconds as only last layer is affected). And this image and the entrypoint are not going to change more frequently than every few months (last change to presto image building was done at 8th of Jan). There is no reason other than regular refactor/maintenance and the need to upgrade the version of base image to modify the entrypoint once we have the memory under control. So no real need to support the 'mounted script' case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607024725



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       Thanks @dungdm93 for the comment. We had similar discussions in the past, and I took the position that we should have copies of all the CI images in airflow repo. And now it's not only my strong opinion but we have a proof that it was a good decision.
   
   First of all - Beware, this is a long answer :). This is only because I try to explain and write it down  also for myself - I am just about to start a similar project in my daily work so I use that opportunity to make my reasoning clear and have some good arguments written down and thoughts wrapped around it. 
   
   Also @kaxil @ashb @dimberman @turbaszek @mik-laj -> I think it might be good you read that, that might be an interesting study for the future of either of the projects/companies we work together on :).
   
   Why do we want to have our ow images for 3rd-party non-official images ? 
   
   The Presto/Trino situation had shown exactly why my decision to copy the images was good. One day out-of-the-sudden, the original `prestosql/prestodb` image disappeared. This already happened - we witnessed it. We do not know when it happened - it was some time between Jan and April 2021. But it did disappear one day.
   
   But for us ... it did not matter. We were completely unaffected by it. Our CI continued to work. No disruption. We even added features to presto provider without any need to change the CI process at all. This is the effect. And I have foreseen it when I migrated all the CI images which were not "official" images to use the ones that are in `apache/airflow` registry.
   
   Usually the CI system is maintained by a handful of people and used by many and if it breaks for whatever reason, everyone is affected but only a handful of people can fix it. Everyone wants to use the CI process, but very, very, very few has the time, skills and will to understand how it works. And when it breaks - it's like literally everyone around expects those people who maintain the CI to fix it. No matter what language it is, no matter how complex it is - people expect it to work and when it does not, they will rather escalate the problem rather than try to understand the root cause and fix it. 
   
   Note, that this is not a complaint - not at all. This is just a statement of fact. I've seen it in every single place I've been - no matter if it was Jenkins, Travis, CircleCI, GitHub Actions, Bamboo,  custom CI, you name it. No matter if it was a couple of scripts in bash, python-based build system, just a bunch of custom "yaml" ci-specific tags put together. It does not matter. The developers expect the CI system to work and very, very, very rarely they will spend any time on fixing any problems there - because this is not their area of expertise and they do not interact with it on a "daily" basis. No matter how hard you try - for 20 years I've seen this happening (and I've seen it happening as creator of such systems, user, manager of projects using those - you name it). So it's not just `Jarek's complex CI systems`. It's `all CI Systems out there`. I'ts just a fact of life.
   
   In effect, this means that if there is a 3rd-party who can break the CI, those handful of people can be put in 'emergency' mode by a 3rd-party decision backed by all the users of CI who will 'complain' and expect the fix to happen. Not good. So for me, the stability of the CI  and it's independence from 3rd parties trumps `easiness of changes` and often even it trumps `simplicity`.  Big time.
   
   So after this long reasoning - in case of  our own copy of images - imagine what happens if we did not have our own image. Out of  sudden, one day without any warning, all the PRs of all the people to airflow would start to fail the moment the image was removed.  And we would have to scramble to find a solution. 
   
   When this happened, the Trino image was likely not yet available as quick fix or mature enough. And we would likely have to reverse engineer Dockerfile commands from the original image, build our own, deal with potential incompatibilities and missing configuration and entrypoint scripts that we would have to extract from a copy of the image (if we were lucky enough to find somewhere an existing image). And it took quite a few hours (and this PR) to switch to Trino imge. 
   
   Or maybe we would have to dig through git history of the prestodb to find out where are the scripts and how they built the image - hoping that it was not a manual process, that it is well documented and that it is actually reproducible. And all this in a big hurry, because of a number of people complaining that our integration tests are broken. Because of those complaints, likely that would mean some mitigation and updates to our CI process to exclude presto from our CI build which will mean that even more time is needed in "emergency" mode from those handful of people.. And often those things are happening while those handful of people are in the middle of important work project that you have to deliver, or they have difficult (or happy) family situation to handle, and they have no extra time / cycles to suddenly fix something that was caused by a 3rd-party's decision to pull the image. 
   
   It's the same reason why I am usually not picking up phone calls from unknown numbers. I do not want others to impact how I spend my time, or how other's problems are causing my emergencies. I saw a very nice quote recently "Your lack of planning is not a reason for my emergency". This is the very thing I hate a lot when someone's bad decisions or lack of planning and communication and causing my emergencies. And I will always fiercely and assertively oppose it, and will do a lot to prevent it.
   
   I do not want those things to happen. If I can avoid it, and have the CI under my project's full control, I'd do it. I do not want to depend on 3rd-party decisions to impact our CI stability. Too many people depend on it and we have already too many reasons for instability - and continuously fighting to get it under control. 
   
   Yes - we have to rely on something as a base and for example I can rely on 'debian:buster-slim', for example or `python` images. They are both "official" images by DockerHub (https://docs.docker.com/docker-hub/official_images/), they are maintained by DockerHub people (on top of the original maintainers) and you are sure there will be there when needed (barring temporary problems like those we experienced over the last 48 hours - but this was temporary incident). They come with SLAs and nobody will just `pull the plug` as it happened with presto.
   
   Last point - when it comes to understanding/building the image - we are using standard FROM: build and I think It's super easy to build the image - simply run `./build_and_push.sh`  (it fails on push if you have no rights to push it). That's it. No more no less. I used it when I iterated with memory parameters and while "slightly" annoying it is actually very simple and fast workflow (building and even pushing the image incrementally, when you just changed `entrypoint` takes seconds as only last layer is affected). And this image and the entrypoint are not going to change more frequently than every few months (last change to presto image building was done at 8th of Jan). There is no reason other than regular refactor/maintenance and the need to upgrade the version of base image to modify the entrypoint once we have the memory under control. So no real need to support the 'mounted script' case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #15187:
URL: https://github.com/apache/airflow/pull/15187


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marcosmarxm commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
marcosmarxm commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607823728



##########
File path: INSTALL
##########
@@ -106,7 +106,7 @@ github_enterprise, google, google_auth, grpc, hashicorp, hdfs, hive, http, imap,
 jira, kerberos, kubernetes, ldap, microsoft.azure, microsoft.mssql, microsoft.winrm, mongo, mssql,
 mysql, neo4j, odbc, openfaas, opsgenie, oracle, pagerduty, papermill, password, pinot, plexus,
 postgres, presto, qds, qubole, rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry,

Review comment:
       ```suggestion
   postgres, qds, qubole, rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry,
   ```
   I think this needs to be removed manually, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813071539


   @mik-laj -> I believe it might fail during docs building (the "trino provider inventory missing` ? Let's see. 
   
   However this one (on top of switching to Trino also will likely improve the stability of the tests (400 MB less memory used during integration tests).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813330384


   @mik-laj - I need your help here. I can't understand why the docs-build fail with missing `trino_to_gcs`  :(. It looks like I have everything right ... but it fails ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607024725



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       Thanks @dungdm93 for the comment. We had similar discussions in the past, and I took the position that we should have copies of all the CI images in airflow repo. And now it's not only my strong opinion but we have a proof that it was a good decision.
   
   First of all - Beware, this is a long answer :). This is only because I try to explain and write it down  also for myself - I am just about to start a similar project in my daily work so I use that opportunity to make my reasoning clear and have some good arguments written down and thoughts wrapped around it. 
   
   Also @kaxil @ashb @dimberman @turbaszek @mik-laj -> I think it might be good you read that, that might be an interesting study for the future of either of the projects/companies we work together on :).
   
   Why do we want to have our ow images for 3rd-party non-official images ? 
   
   The Presto/Trino situation had shown exactly why my decision to copy the images was good. One day out-of-the-sudden, the original `prestosql/prestodb` image disappeared. This already happened - we witnessed it. We do not know when it happened - it was some time between Jan and April 2021. But it did disappear one day.
   
   But for us ... it did not matter. We were completely unaffected by it. Our CI continued to work. No disruption. We even added features to presto provider without any need to change the CI process at all. This is the effect. And I have foreseen it when I migrated all the CI images which were not "official" images to use the ones that are in `apache/airflow` registry.
   
   Usually the CI system is maintained by a handful of people and used by many and if it breaks for whatever reason, everyone is affected but only a handful of people can fix it. Everyone wants to use the CI process, but very, very, very few has the time, skills and will to understand how it works. And when it breaks - it's like literally everyone around expects those people who maintain the CI to fix it. No matter what language it is, no matter how complex it is - people expect it to work and when it does not, they will rather escalate the problem rather than try to understand the root cause and fix it. 
   
   Note, that this is not a complaint - not at all. This is just a statement of fact. I've seen it in every single place I've been - no matter if it was Jenkins, Travis, CircleCI, GitHub Actions, Bamboo,  custom CI, you name it. No matter if it was a couple of scripts in bash, python-based build system, just a bunch of custom "yaml" ci-specific tags put together. It does not matter. The developers expect the CI system to work and very, very, very rarely they will spend any time on fixing any problems there - because this is not their area of expertise and they do not interact with it on a "daily" basis. No matter how hard you try - for 20 years I've seen this happening (and I've seen it happening as creator of such systems, user, manager of projects using those - you name it). So it's not just `Jarek's complex CI systems`. It's `all CI Systems out there`. I'ts just a fact of life.
   
   In effect, this means that if there is a 3rd-party who can break the CI, those handful of people can be put in 'emergency' mode by a 3rd-party decision backed by all the users of CI who will 'complain' and expect the fix to happen. Not good. So for me, the stability of the CI  and it's independence from 3rd parties trumps `easiness of changes` and often even it trumps `simplicity`.  Big time.
   
   So after this long reasoning - in case of  our own copy of images - imagine what happens if we did not have our own image. Out of  sudden, one day without any warning, all the PRs of all the people to airflow would start to fail the moment the image was removed.  And we would have to scramble to find a solution. When this happened, the Trino image was likely not yet available as quick fix or mature enough. And we would likely have to reverse engineer Dockerfile commands from the original image, build our own, deal with potential incompatibilities and missing configuration and entrypoint scripts that we would have to extract from a copy of the image (if we were lucky enough to find somewhere an existing image). And it took quite a few hours (and this PR) to switch to Trino imge. Or  maybe we would have to dig through git history of the prestodb to find out where are the scripts and how they built the image - hoping that it was not a manual process, that it is well documented and th
 at it is actually reproducible. And all this in a big hurry, because of a number of people complaining that our integration tests are broken. Because of those complaints, likely that would mean some mitigation and updates to our CI process to exclude presto from our CI build which will mean that even more time is needed in "emergency" mode from those handful of people.. And often those things are happening while those handful of people are in the middle of important work project that you have to deliver, or they have difficult (or happy) family situation to handle, and they have no extra time / cycles to find to suddenly fix something that was caused by a 3rd-party's decision to pull the image.
   
   I do not want those things to happen. If I can avoid it, and have the CI under my project's full control, I'd do it. I do not want to depend on 3rd-party decisions to impact our CI stability. Too many people depend on it and we have already too many reasons for instability - and continuously fighting to get it under control. 
   
   Yes - we have to rely on something as a base and for example I can rely on 'debian:buster-slim', for example or `python` images. They are both "official" images by DockerHub (https://docs.docker.com/docker-hub/official_images/), they are maintained by DockerHub people (on top of the original maintainers) and you are sure there will be there when needed (barring temporary problems like those we experienced over the last 48 hours - but this was temporary incident). They come with SLAs and nobody will just `pull the plug` as it happened with presto.
   
   Last [point - when it comes to understanding/building the image - we are using standard FROM: build and I think It's super easy to build the image - simply run `./build_and_push.sh`  (it fails on push if you have no rights to push it). That's it. No more no less. I used it when I iterated with memory parameters and while "slightly" annoying it is actually very simple and fast workflow (building and even pushing the image incrementally, when you just changed `entrypoint` takes seconds as only last layer is affected). And this image and the entrypoint are not going to change more frequently than every few months (last change to presto image building was done at 8th of Jan). There is no reason other than regular refactor/maintenance and the need to upgrade the version of base image to modify the entrypoint once we have the memory under control. So no real need to support the 'mounted script' case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813626346


   [The Workflow run](https://github.com/apache/airflow/actions/runs/720297980) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813457835


   > It looks like I have everything right ... but it fails ...
   
   It is true. You have the correct code, but the cache contains outdated information.  To prevent such problems, we build documentation twice in some situations (see: https://github.com/apache/airflow/pull/12527#issuecomment-731568418 and .https://github.com/apache/airflow/blob/4099108f554130cf3f87ba33b9d6084a74e70231/docs/build_docs.py#L545-L563)  Unfortunately, here is another problem.  We have a smart code preview in errors summary, which cannot display the code if it is automatically generated (see:: https://github.com/apache/airflow/blob/ef4af21/docs/apache-airflow-providers/packages-ref.rst).
   
   I pushed fixes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813457835


   > It looks like I have everything right ... but it fails ...
   
   It is true. You have the correct code, but the cache contains outdated information.  To prevent such problems, we build documentation twice in some situations (see: https://github.com/apache/airflow/pull/12527#issuecomment-731568418 and .https://github.com/apache/airflow/blob/4099108f554130cf3f87ba33b9d6084a74e70231/docs/build_docs.py#L545-L563)  Unfortunately, here is another problem.  We have a smart code preview in errors summary, which cannot display the code if it is automatically generated (see:: https://github.com/apache/airflow/blob/ef4af21/docs/apache-airflow-providers/packages-ref.rst). 
   
   I pushed fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607024725



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       Thanks @dungdm93 for the comment. We had similar discussions in the past, and I took the position that we should have copies of all the CI images in airflow repo. And now it's not only my strong opinion but we have a proof that it was a good decision.
   
   First of all - Beware, this is a long answer :). This is only because I try to explain and write it down  also for myself - I am just about to start a similar project in my daily work so I use that opportunity to make my reasoning clear and have some good arguments written down and thoughts wrapped around it. 
   
   Also @kaxil @ashb @dimberman @turbaszek @mik-laj -> I think it might be good you read that, that might be an interesting study for the future of either of the companies we work together on :).
   
   Why do we want to have our ow images for 3rd-party non-official images ? 
   
   The presto/trino situation had shown exactly why my decision to copy the images was good. One day out-of-the-sudden, the original `prestosql/prestodb` image disappeared. This already happened - we witnessed it.
   
   But for us ... it did not matter. We were completely unaffected by it. Our CI continued to work. No disruption. We even added features to presto provider without any need to change the CI process at all. This is the effect. And I have foreseen it when I migrated all the CI images which were not "official" images to use the ones that are in apache/airflow registry.
   
   Usually the CI system is maintained by a handful of people and used by many and if it breaks for whatever reason, everyone is affected but only a handful of people can fix it. Everyone wants to use the CI process, but very, very, very few has the time, skills and will to understand how it works. And when it breaks - it's like literally everyone around expects those people who maintain the CI to fix it. No matter what language it is, no matter how complex it is - people expect it to work and when it does not, they will rather escalate the problem rather than try to understand the root cause and fix it. 
   
   Note, that this is not a complaint - not at all. This is just a statement of fact. I've seen it in every single place I've been - no matter if it was Jenkins, Travis, CircleCI, GitHub Actions, Bamboo,  custom CI, you name it. No matter if it was a couple of scripts in bash, python-based build system, just a bunch of custom "yaml" ci-specific tags put together. It does not matter. The developers expect the CI system to work and very, very, very rarely they will spend any time on fixing any problems there - because this is not their area of expertise and they do not interact with it on a "daily" basis. No matter how hard you try - for 20 years I've seen this happening (and I've seen it happening as creator of such systems, user, manager of projects using those - you name it). So it's not just `Jarek's complex CI systems`. It's `all CI Systems out there`. I'ts just a fact of life.
   
   In effect, this means that if there is a 3rd-party who can break the CI, those handful of people can be put in 'emergency' mode by a 3rd-party decision backed by all the users of CI who will 'compllain' and expect the fix to happen. Not good. So for me, the stability of the CI  and it's independence from 3rd parties trumps `easiness of changes` and often even it trumps `simplicity`.  Big time.
   
   So after this long reasoning - in case of  our own copy of images - imagine what happens if we did not have our own image. Out of  sudden, one day without any warning, all the PRs of all the people to airflow would start to fail the moment the image was removed.  And we would have to scramble to find a solution. When this happened, the trino image was likely not yet available as quick fix or mature enough. And we would likely have to reverse engineer dockerfile commands from the original image, build our own, deal with potential incompatibilities and missing configuration and entrypoint scripts that we would have to extract from a copy of the image (if we were lucky enough to find somewhere an existing image). Or  maybe we would have to dig through git history of the prestodbsql to find out where are the scripts and how they built the image - hoping that it was not a manual process, that it is well documented and that it is actually reproducible. And all this in a big hurry, becau
 se of a number of people complaining that our integration tests are broken, so likely that would mean some mitigations and updates to our CI process to exclude presto from our CI build. And often those things are happening while you are in the middle of important work project that you have to deliver and you have no extra time to get to suddenly fix something that was caused by a 3rd-party's decision to pull the image.
   
   I do not want those things to happen. If I can avoid it, and have the CI under my project's full control, I'd do it. I do not want to depend on 3rd-party decisions to impact our CI stability. Too many people depend on it and we have already too many reasons for instability - and continuously fighting to get it under control. 
   
   Yes - we have to rely on something as a base and for example I can rely on 'debian:buster-slim', for example or `python` images. They are both "official" images by DockerHub (https://docs.docker.com/docker-hub/official_images/), they are maintained by DockerHub people (on top of the original maintainers) and you are sure there will be there when needed (barring temporary problems like those we experienced over the last 48 hours - but this was temporary incident). They come with SLAs and nobody will just 'pull them' as it happened with presto.
   
   Last [point - when it comes to understanding/building the image - we are using standard FROM: build and I think It's super easy to build the image - simply run `./build_and_push.sh`  (it fails on push if you have no rights to push it). That's it. No more no less. I used it when I iterated with memory parameters and while "slightly" annoying it is actually very simple and fast workflow (building and even pushing the image incrementally, when you just changed `entrypoint` takes seconds as only last layer is affected). And this image and the entrypoint are not going to change more frequently than every few months (last change to presto image building was done at 8th of Jab). There is no reason other than regular refactor/maintenance and the need to upgrade the version of base image to modify the entrypoint once we have the memory under control. So no real need to support the 'mounted script' case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813071801


   [The Workflow run](https://github.com/apache/airflow/actions/runs/717117322) is cancelling this PR. Building images for the PR has failed. Follow the workflow link to check the reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dungdm93 commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
dungdm93 commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r606895250



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       I'd like to recommend use official trino image and mount `entrypoint.sh` at runtime instead of build custom image:
   ```yaml
     trino:
       image: trinodb/trino:353
       entrypoint: ["/usr/local/bin/entrypoint.sh"]
       command: ["/usr/lib/trino/bin/run-trino"]
       volumes:
       - ../dockerfiles/trino/entrypoint.sh:/usr/local/bin/entrypoint.sh:ro
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607024725



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       Thanks @dungdm93 for the comment. We had similar discussions in the past, and I took the position that we should have copies of all the CI images in airflow repo. And now it's not only my strong opinion but we have a proof that it was a good decision.
   
   First of all - Beware, this is a long answer :). This is only because I try to explain and write it down  also for myself - I am just about to start a similar project in my daily work so I use that opportunity to make my reasoning clear and have some good arguments written down and thoughts wrapped around it. 
   
   Also @kaxil @ashb @dimberman @turbaszek @mik-laj -> I think it might be good you read that, that might be an interesting study for the future of either of the companies we work together on :).
   
   Why do we want to have our ow images for 3rd-party non-official images ? 
   
   The presto/trino situation had shown exactly why my decision to copy the images was good. One day out-of-the-sudden, the original `prestosql/prestodb` image disappeared. This already happened - we witnessed it.
   
   But for us ... it did not matter. We were completely unaffected by it. Our CI continued to work. No disruption. We even added features to presto provider without any need to change the CI process at all. This is the effect. And I have foreseen it when I migrated all the CI images which were not "official" images to use the ones that are in apache/airflow registry.
   
   Usually the CI system is maintained by a handful of people and used by many and if it breaks for whatever reason, everyone is affected but only a handful of people can fix it. Everyone wants to use the CI process, but very, very, very few has the time, skills and will to understand how it works. And when it breaks - it's like literally everyone around expects those people who maintain the CI to fix it. No matter what language it is, no matter how complex it is - people expect it to work and when it does not, they will rather escalate the problem rather than try to understand the root cause and fix it. 
   
   Note, that this is not a complaint - not at all. This is just a statement of fact. I've seen it in every single place I've been - no matter if it was Jenkins, Travis, CircleCI, GitHub Actions, Bamboo,  custom CI, you name it. No matter if it was a couple of scripts in bash, python-based build system, just a bunch of custom "yaml" ci-specific tags put together. It does not matter. The developers expect the CI system to work and very, very, very rarely they will spend any time on fixing any problems there - because this is not their area of expertise and they do not interact with it on a "daily" basis. No matter how hard you try - for 20 years I've seen this happening (and I've seen it happening as creator of such systems, user, manager of projects using those - you name it). So it's not just `Jarek's complex CI systems`. It's `all CI Systems out there`. I'ts just a fact of life.
   
   In effect, this means that if there is a 3rd-party who can break the CI, those handful of people can be put in 'emergency' mode by a 3rd-party decision backed by all the users of CI who will 'compllain' and expect the fix to happen. Not good. So for me, the stability of the CI  and it's independence from 3rd parties trumps 'easiness of changes` and sometimes even `simplicity`.  Big time.
   
   So after this long reasoning - in case of  our own copy of images - imagine what happens if we did not have our own image. Out of  sudden, one day without any warning, all the PRs of all the people to airflow would start to fail the moment the image was removed.  And we would have to scramble to find a solution. When this happened, the trino image was likely not yet available as quick fix or mature enough. And we would likely have to reverse engineer dockerfile commands from the original image, build our own, deal with potential incompatibilities and missing configuration and entrypoint scripts that we would have to extract from a copy of the image (if we were lucky enough to find somewhere an existing image). Or  maybe we would have to dig through git history of the prestodbsql to find out where are the scripts and how they built the image - hoping that it was not a manual process, that it is well documented and that it is actually reproducible. And all this in a big hurry, becau
 se of a number of people complaining that our integration tests are broken, so likely that would mean some mitigations and updates to our CI process to exclude presto from our CI build. And often those things are happening while you are in the middle of important work project that you have to deliver and you have no extra time to get to suddenly fix something that was caused by a 3rd-party's decision to pull the image.
   
   I do not want those things to happen. If I can avoid it, and have the CI under my project's full control, I'd do it. I do not want to depend on 3rd-party decisions to impact our CI stability. Too many people depend on it and we have already too many reasons for instability - and continuously fighting to get it under control. 
   
   Yes - we have to rely on something as a base and for example I can rely on 'debian:buster-slim', for example or `python` images. They are both "official" images by DockerHub (https://docs.docker.com/docker-hub/official_images/), they are maintained by DockerHub people (on top of the original maintainers) and you are sure there will be there when needed (barring temporary problems like those we experienced over the last 48 hours - but this was temporary incident). They come with SLAs and nobody will just 'pull them' as it happened with presto.
   
   Last [point - when it comes to understanding/building the image - we are using standard FROM: build and I think It's super easy to build the image - simply run `./build_and_push.sh`  (it fails on push if you have no rights to push it). That's it. No more no less. I used it when I iterated with memory parameters and while "slightly" annoying it is actually very simple and fast workflow (building and even pushing the image incrementally, when you just changed `entrypoint` takes seconds as only last layer is affected). And this image and the entrypoint are not going to change more frequently than every few months (last change to presto image building was done at 8th of Jab). There is no reason other than regular refactor/maintenance and the need to upgrade the version of base image to modify the entrypoint once we have the memory under control. So no real need to support the 'mounted script' case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607024725



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       Thanks @dungdm93 for the comment. We had similar discussions in the past, and I took the position that we should have copies of all the CI images in airflow repo. And now it's not only my strong opinion but we have a proof that it was a good decision.
   
   First of all - Beware, this is a long answer :). This is only because I try to explain and write it down  also for myself - I am just about to start a similar project in my daily work so I use that opportunity to make my reasoning clear and have some good arguments written down and thoughts wrapped around it. 
   
   Also @kaxil @ashb @dimberman @turbaszek @mik-laj -> I think it might be good you read that, that might be an interesting study for the future of either of the projects/companies we work together on :).
   
   Why do we want to have our ow images for 3rd-party non-official images ? 
   
   The presto/trino situation had shown exactly why my decision to copy the images was good. One day out-of-the-sudden, the original `prestosql/prestodb` image disappeared. This already happened - we witnessed it.
   
   But for us ... it did not matter. We were completely unaffected by it. Our CI continued to work. No disruption. We even added features to presto provider without any need to change the CI process at all. This is the effect. And I have foreseen it when I migrated all the CI images which were not "official" images to use the ones that are in apache/airflow registry.
   
   Usually the CI system is maintained by a handful of people and used by many and if it breaks for whatever reason, everyone is affected but only a handful of people can fix it. Everyone wants to use the CI process, but very, very, very few has the time, skills and will to understand how it works. And when it breaks - it's like literally everyone around expects those people who maintain the CI to fix it. No matter what language it is, no matter how complex it is - people expect it to work and when it does not, they will rather escalate the problem rather than try to understand the root cause and fix it. 
   
   Note, that this is not a complaint - not at all. This is just a statement of fact. I've seen it in every single place I've been - no matter if it was Jenkins, Travis, CircleCI, GitHub Actions, Bamboo,  custom CI, you name it. No matter if it was a couple of scripts in bash, python-based build system, just a bunch of custom "yaml" ci-specific tags put together. It does not matter. The developers expect the CI system to work and very, very, very rarely they will spend any time on fixing any problems there - because this is not their area of expertise and they do not interact with it on a "daily" basis. No matter how hard you try - for 20 years I've seen this happening (and I've seen it happening as creator of such systems, user, manager of projects using those - you name it). So it's not just `Jarek's complex CI systems`. It's `all CI Systems out there`. I'ts just a fact of life.
   
   In effect, this means that if there is a 3rd-party who can break the CI, those handful of people can be put in 'emergency' mode by a 3rd-party decision backed by all the users of CI who will 'compllain' and expect the fix to happen. Not good. So for me, the stability of the CI  and it's independence from 3rd parties trumps `easiness of changes` and often even it trumps `simplicity`.  Big time.
   
   So after this long reasoning - in case of  our own copy of images - imagine what happens if we did not have our own image. Out of  sudden, one day without any warning, all the PRs of all the people to airflow would start to fail the moment the image was removed.  And we would have to scramble to find a solution. When this happened, the trino image was likely not yet available as quick fix or mature enough. And we would likely have to reverse engineer dockerfile commands from the original image, build our own, deal with potential incompatibilities and missing configuration and entrypoint scripts that we would have to extract from a copy of the image (if we were lucky enough to find somewhere an existing image). Or  maybe we would have to dig through git history of the prestodbsql to find out where are the scripts and how they built the image - hoping that it was not a manual process, that it is well documented and that it is actually reproducible. And all this in a big hurry, becau
 se of a number of people complaining that our integration tests are broken, so likely that would mean some mitigations and updates to our CI process to exclude presto from our CI build. And often those things are happening while you are in the middle of important work project that you have to deliver and you have no extra time to get to suddenly fix something that was caused by a 3rd-party's decision to pull the image.
   
   I do not want those things to happen. If I can avoid it, and have the CI under my project's full control, I'd do it. I do not want to depend on 3rd-party decisions to impact our CI stability. Too many people depend on it and we have already too many reasons for instability - and continuously fighting to get it under control. 
   
   Yes - we have to rely on something as a base and for example I can rely on 'debian:buster-slim', for example or `python` images. They are both "official" images by DockerHub (https://docs.docker.com/docker-hub/official_images/), they are maintained by DockerHub people (on top of the original maintainers) and you are sure there will be there when needed (barring temporary problems like those we experienced over the last 48 hours - but this was temporary incident). They come with SLAs and nobody will just `pull the plug` as it happened with presto.
   
   Last [point - when it comes to understanding/building the image - we are using standard FROM: build and I think It's super easy to build the image - simply run `./build_and_push.sh`  (it fails on push if you have no rights to push it). That's it. No more no less. I used it when I iterated with memory parameters and while "slightly" annoying it is actually very simple and fast workflow (building and even pushing the image incrementally, when you just changed `entrypoint` takes seconds as only last layer is affected). And this image and the entrypoint are not going to change more frequently than every few months (last change to presto image building was done at 8th of Jan). There is no reason other than regular refactor/maintenance and the need to upgrade the version of base image to modify the entrypoint once we have the memory under control. So no real need to support the 'mounted script' case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813554493


   [The Workflow run](https://github.com/apache/airflow/actions/runs/720032744) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607024725



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       Thanks @dungdm93 for the comment. We had similar discussions in the past, and I took the position that we should have copies of all the CI images in airflow repo. And now it's not only my strong opinion but we have a proof that it was a good decision.
   
   First of all - Beware, this is a long answer :). This is only because I try to explain and write it down  also for myself - I am just about to start a similar project in my daily work so I use that opportunity to make my reasoning clear and have some good arguments written down and thoughts wrapped around it. 
   
   Also @kaxil @ashb @dimberman @turbaszek @mik-laj -> I think it might be good you read that, that might be an interesting study for the future of either of the projects/companies we work together on :).
   
   Why do we want to have our ow images for 3rd-party non-official images ? 
   
   The Presto/Trino situation had shown exactly why my decision to copy the images was good. One day out-of-the-sudden, the original `prestosql/prestodb` image disappeared. This already happened - we witnessed it. We do not know when it happened - it was some time between Jan and April 2021. But it did disappear one day.
   
   But for us ... it did not matter. We were completely unaffected by it. Our CI continued to work. No disruption. We even added features to presto provider without any need to change the CI process at all. This is the effect. And I have foreseen it when I migrated all the CI images which were not "official" images to use the ones that are in `apache/airflow` registry.
   
   Usually the CI system is maintained by a handful of people and used by many and if it breaks for whatever reason, everyone is affected but only a handful of people can fix it. Everyone wants to use the CI process, but very, very, very few has the time, skills and will to understand how it works. And when it breaks - it's like literally everyone around expects those people who maintain the CI to fix it. No matter what language it is, no matter how complex it is - people expect it to work and when it does not, they will rather escalate the problem rather than try to understand the root cause and fix it. 
   
   Note, that this is not a complaint - not at all. This is just a statement of fact. I've seen it in every single place I've been - no matter if it was Jenkins, Travis, CircleCI, GitHub Actions, Bamboo,  custom CI, you name it. No matter if it was a couple of scripts in bash, python-based build system, just a bunch of custom "yaml" ci-specific tags put together. It does not matter. The developers expect the CI system to work and very, very, very rarely they will spend any time on fixing any problems there - because this is not their area of expertise and they do not interact with it on a "daily" basis. No matter how hard you try - for 20 years I've seen this happening (and I've seen it happening as creator of such systems, user, manager of projects using those - you name it). So it's not just `Jarek's complex CI systems`. It's `all CI Systems out there`. I'ts just a fact of life.
   
   In effect, this means that if there is a 3rd-party who can break the CI, those handful of people can be put in 'emergency' mode by a 3rd-party decision backed by all the users of CI who will 'complain' and expect the fix to happen. Not good. So for me, the stability of the CI  and it's independence from 3rd parties trumps `easiness of changes` and often even it trumps `simplicity`.  Big time.
   
   So after this long reasoning - in case of  our own copy of images - imagine what happens if we did not have our own image. Out of  sudden, one day without any warning, all the PRs of all the people to airflow would start to fail the moment the image was removed.  And we would have to scramble to find a solution. When this happened, the Trino image was likely not yet available as quick fix or mature enough. And we would likely have to reverse engineer Dockerfile commands from the original image, build our own, deal with potential incompatibilities and missing configuration and entrypoint scripts that we would have to extract from a copy of the image (if we were lucky enough to find somewhere an existing image). And it took quite a few hours (and this PR) to switch to Trino imge. Or  maybe we would have to dig through git history of the prestodb to find out where are the scripts and how they built the image - hoping that it was not a manual process, that it is well documented and th
 at it is actually reproducible. And all this in a big hurry, because of a number of people complaining that our integration tests are broken, so likely that would mean some mitigation and updates to our CI process to exclude presto from our CI build. And often those things are happening while you are in the middle of important work project that you have to deliver and you have no extra time to get to suddenly fix something that was caused by a 3rd-party's decision to pull the image.
   
   I do not want those things to happen. If I can avoid it, and have the CI under my project's full control, I'd do it. I do not want to depend on 3rd-party decisions to impact our CI stability. Too many people depend on it and we have already too many reasons for instability - and continuously fighting to get it under control. 
   
   Yes - we have to rely on something as a base and for example I can rely on 'debian:buster-slim', for example or `python` images. They are both "official" images by DockerHub (https://docs.docker.com/docker-hub/official_images/), they are maintained by DockerHub people (on top of the original maintainers) and you are sure there will be there when needed (barring temporary problems like those we experienced over the last 48 hours - but this was temporary incident). They come with SLAs and nobody will just `pull the plug` as it happened with presto.
   
   Last [point - when it comes to understanding/building the image - we are using standard FROM: build and I think It's super easy to build the image - simply run `./build_and_push.sh`  (it fails on push if you have no rights to push it). That's it. No more no less. I used it when I iterated with memory parameters and while "slightly" annoying it is actually very simple and fast workflow (building and even pushing the image incrementally, when you just changed `entrypoint` takes seconds as only last layer is affected). And this image and the entrypoint are not going to change more frequently than every few months (last change to presto image building was done at 8th of Jan). There is no reason other than regular refactor/maintenance and the need to upgrade the version of base image to modify the entrypoint once we have the memory under control. So no real need to support the 'mounted script' case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607024725



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       Thanks @dungdm93 for the comment. We had similar discussions in the past, and I took the position that we should have copies of all the CI images in airflow repo. And now it's not only my strong opinion but we have a proof that it was a good decision.
   
   First of all - Beware, this is a long answer :). This is only because I try to explain and write it down  also for myself - I am just about to start a similar project in my daily work so I use that opportunity to make my reasoning clear and have some good arguments written down and thought wrapped around it. 
   
   Also @kaxil @ashb @dimberman @turbaszek @mik-laj -> I think it might be good you read that, that might be an interesting study for the future of either of the companies we work together on :).
   
   Why do we want to have our ow images for 3rd-party non-official images ? 
   
   The presto/trino situation had shown exactly why my decision to copy the images was good. One day out-of-the-sudden, the original `prestosql/prestodb` image disappeared. This already happened - we witnessed it.
   
   But for us ... it did not matter. We were completely unaffected by it. Our CI continued to work. No disruption. We even added features to presto provider without any need to change the CI process at all. This is the effect. And I have foreseen it when I migrated all the CI images which were not "official" images to use the ones that are in apache/airflow registry.
   
   Usually the CI system is maintained by a handful of people and used by many and if it breaks for whatever reason, everyone is affected but only a handful of people can fix it. Everyone wants to use the CI process, but very, very, very few has the time, skills and will to understand how it works. And when it breaks - it's like literally everyone around expects those people who maintain the CI to fix it. No matter what language it is, no matter how complex it is - people expect it to work and when it does not, they will rather escalate the problem rather than try to understand the root cause and fix it. 
   
   Note, that this is not a complaint - not at all. This is just a statement of fact. I've seen it in every single place I've been - no matter if it was Jenkins, Travis, CircleCI, GitHub Actions, Bamboo,  custom CI, you name it. No matter if it was a couple of scripts in bash, python-based build system, just a bunch of custom "yaml" ci-specific tags put together. It does not matter. The developers expect the CI system to work and very, very, very rarely they will spend any time on fixing any problems there - because this is not their area of expertise and they do not interact with it on a "daily" basis. No matter how hard you try - for 20 years I've seen this happening (and I've seen it happening as creator of such systems, user, manager of projects using those - you name it). So it's not just `Jarek's complex CI systems`. It's `all CI Systems out there`. I'ts just a fact of life.
   
   In effect, this means that if there is a 3rd-party who can break the CI, those handful of people can be put in 'emergency' mode by a 3rd-party decision backed by all the users of CI who will 'compllain' and expect the fix to happen. Not good. So for me, the stability of the CI  and it's independence from 3rd parties trumps 'easiness of changes` and sometimes even `simplicity`.  Big time.
   
   So after this long reasoning - in case of  our own copy of images - imagine what happens if we did not have our own image. Out of  sudden, one day without any warning, all the PRs of all the people to airflow would start to fail the moment the image was removed.  And we would have to scramble to find a solution. When this happened, the trino image was likely not yet available as quick fix or mature enough. And we would likely have to reverse engineer dockerfile commands from the original image, build our own, deal with potential incompatibilities and missing configuration and entrypoint scripts that we would have to extract from a copy of the image (if we were lucky enough to find somewhere an existing image). Or  maybe we would have to dig through git history of the prestodbsql to find out where are the scripts and how they built the image - hoping that it was not a manual process, that it is well documented and that it is actually reproducible. And all this in a big hurry, becau
 se of a number of people complaining that our integration tests are broken, so likely that would mean some mitigations and updates to our CI process to exclude presto from our CI build. And often those things are happening while you are in the middle of important work project that you have to deliver and you have no extra time to get to suddenly fix something that was caused by a 3rd-party's decision to pull the image.
   
   I do not want those things to happen. If I can avoid it, and have the CI under my project's full control, I'd do it. I do not want to depend on 3rd-party decisions to impact our CI stability. Too many people depend on it and we have already too many reasons for instability - and continuously fighting to get it under control. 
   
   Yes - we have to rely on something as a base and for example I can rely on 'debian:buster-slim', for example or `python` images. They are both "official" images by DockerHub (https://docs.docker.com/docker-hub/official_images/), they are maintained by DockerHub people (on top of the original maintainers) and you are sure there will be there when needed (barring temporary problems like those we experienced over the last 48 hours - but this was temporary incident). They come with SLAs and nobody will just 'pull them' as it happened with presto.
   
   Last [point - when it comes to understanding/building the image - we are using standard FROM: build and I think It's super easy to build the image - simply run `./build_and_push.sh`  (it fails on push if you have no rights to push it). That's it. No more no less. I used it when I iterated with memory parameters and while "slightly" annoying it is actually very simple and fast workflow (building and even pushing the image incrementally, when you just changed `entrypoint` takes seconds as only last layer is affected). And this image and the entrypoint are not going to change more frequently than every few months (last change to presto image building was done at 8th of Jab). There is no reason other than regular refactor/maintenance and the need to upgrade the version of base image to modify the entrypoint once we have the memory under control. So no real need to support the 'mounted script' case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607745424



##########
File path: scripts/ci/docker-compose/integration-redis.yml
##########
@@ -21,7 +21,7 @@ services:
     image: redis:5.0.1
     volumes:
       - /dev/urandom:/dev/random   # Required to get non-blocking entropy source
-      - redis-db-volume:/data/presto
+      - redis-db-volume:/data/redis

Review comment:
       Is this related to adding Trino provider or just a fix on the way ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813071521


   [The Workflow run](https://github.com/apache/airflow/actions/runs/717113093) is cancelling this PR. Building images for the PR has failed. Follow the workflow link to check the reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813990213


   It's green !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r607024725



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       Thanks @dungdm93 for the comment. We had similar discussions in the past, and I took the position that we should have copies of all the CI images in airflow repo. And now it's not only my strong opinion but we have a proof that it was a good decision.
   
   First of all - Beware, this is a long answer :). This is only because I try to explain and write it down  also for myself - I am just about to start a similar project in my daily work so I use that opportunity to make my reasoning clear and have some good arguments written down and thoughts wrapped around it. 
   
   Also @kaxil @ashb @dimberman @turbaszek @mik-laj -> I think it might be good you read that, that might be an interesting study for the future of either of the projects/companies we work together on :).
   
   Why do we want to have our ow images for 3rd-party non-official images ? 
   
   The Presto/Trino situation had shown exactly why my decision to copy the images was good. One day out-of-the-sudden, the original `prestosql/prestodb` image disappeared. This already happened - we witnessed it. We do not know when it happened - it was some time between Jan and April 2021. But it did disappear one day.
   
   But for us ... it did not matter. We were completely unaffected by it. Our CI continued to work. No disruption. We even added features to presto provider without any need to change the CI process at all. This is the effect. And I have foreseen it when I migrated all the CI images which were not "official" images to use the ones that are in `apache/airflow` registry.
   
   Usually the CI system is maintained by a handful of people and used by many and if it breaks for whatever reason, everyone is affected but only a handful of people can fix it. Everyone wants to use the CI process, but very, very, very few has the time, skills and will to understand how it works. And when it breaks - it's like literally everyone around expects those people who maintain the CI to fix it. No matter what language it is, no matter how complex it is - people expect it to work and when it does not, they will rather escalate the problem rather than try to understand the root cause and fix it. 
   
   Note, that this is not a complaint - not at all. This is just a statement of fact. I've seen it in every single place I've been - no matter if it was Jenkins, Travis, CircleCI, GitHub Actions, Bamboo,  custom CI, you name it. No matter if it was a couple of scripts in bash, python-based build system, just a bunch of custom "yaml" ci-specific tags put together. It does not matter. The developers expect the CI system to work and very, very, very rarely they will spend any time on fixing any problems there - because this is not their area of expertise and they do not interact with it on a "daily" basis. No matter how hard you try - for 20 years I've seen this happening (and I've seen it happening as creator of such systems, user, manager of projects using those - you name it). So it's not just `Jarek's complex CI systems`. It's `all CI Systems out there`. I'ts just a fact of life.
   
   In effect, this means that if there is a 3rd-party who can break the CI, those handful of people can be put in 'emergency' mode by a 3rd-party decision backed by all the users of CI who will 'complain' and expect the fix to happen. Not good. So for me, the stability of the CI  and it's independence from 3rd parties trumps `easiness of changes` and often even it trumps `simplicity`.  Big time.
   
   So after this long reasoning - in case of  our own copy of images - imagine what happens if we did not have our own image. Out of  sudden, one day without any warning, all the PRs of all the people to airflow would start to fail the moment the image was removed.  And we would have to scramble to find a solution. 
   
   When this happened, the Trino image was likely not yet available as quick fix or mature enough. And we would likely have to reverse engineer Dockerfile commands from the original image, build our own, deal with potential incompatibilities and missing configuration and entrypoint scripts that we would have to extract from a copy of the image (if we were lucky enough to find somewhere an existing image). And it took quite a few hours (and this PR) to switch to Trino imge. 
   
   Or maybe we would have to dig through git history of the prestodb to find out where are the scripts and how they built the image - hoping that it was not a manual process, that it is well documented and that it is actually reproducible. And all this in a big hurry, because of a number of people complaining that our integration tests are broken. Because of those complaints, likely that would mean some mitigation and updates to our CI process to exclude presto from our CI build which will mean that even more time is needed in "emergency" mode from those handful of people.. And often those things are happening while those handful of people are in the middle of important work project that you have to deliver, or they have difficult (or happy) family situation to handle, and they have no extra time / cycles to suddenly fix something that was caused by a 3rd-party's decision to pull the image. 
   
   It's the same reason why I am usually not picking up phone calls from unknown numbers. I do not want others to impact how I spend my time, or how other's problems are causing my emergencies. I saw a very nice quote recently "Your lack of planning is not a reason for my emergency". This is the very thing I hate a lot when someone's bad decisions or lack of planning and communication and causing my emergencies. And I will always fiercely and assertively oppose it, and will do a lot to prevent it.
   
   I do not want those things to happen. If I can avoid it, and have the CI under my project's full control, I'd do it. I do not want to depend on 3rd-party decisions to impact our CI stability. Too many people depend on it and we have already too many reasons for instability - and continuously fighting to get it under control. 
   
   Yes - we have to rely on something as a base and for example I can rely on 'debian:buster-slim', for example or `python` images. They are both "official" images by DockerHub (https://docs.docker.com/docker-hub/official_images/), they are maintained by DockerHub people (on top of the original maintainers) and you are sure there will be there when needed (barring temporary problems like those we experienced over the last 48 hours - but this was temporary incident). They come with SLAs and nobody will just `pull the plug` as it happened with presto.
   
   Last [point - when it comes to understanding/building the image - we are using standard FROM: build and I think It's super easy to build the image - simply run `./build_and_push.sh`  (it fails on push if you have no rights to push it). That's it. No more no less. I used it when I iterated with memory parameters and while "slightly" annoying it is actually very simple and fast workflow (building and even pushing the image incrementally, when you just changed `entrypoint` takes seconds as only last layer is affected). And this image and the entrypoint are not going to change more frequently than every few months (last change to presto image building was done at 8th of Jan). There is no reason other than regular refactor/maintenance and the need to upgrade the version of base image to modify the entrypoint once we have the memory under control. So no real need to support the 'mounted script' case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r606833549



##########
File path: airflow/providers/trino/hooks/trino.py
##########
@@ -0,0 +1,191 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import os
+from typing import Any, Iterable, Optional
+
+import trino
+from trino.exceptions import DatabaseError
+from trino.transaction import IsolationLevel
+
+from airflow import AirflowException
+from airflow.configuration import conf
+from airflow.hooks.dbapi import DbApiHook
+from airflow.models import Connection
+
+
+class TrinoException(Exception):
+    """Trino exception"""
+
+
+def _boolify(value):
+    if isinstance(value, bool):
+        return value
+    if isinstance(value, str):
+        if value.lower() == 'false':
+            return False
+        elif value.lower() == 'true':
+            return True
+    return value
+
+
+class TrinoHook(DbApiHook):
+    """
+    Interact with Trino through trino package.
+
+    >>> ph = TrinoHook()
+    >>> sql = "SELECT count(1) AS num FROM airflow.static_babynames"
+    >>> ph.get_records(sql)
+    [[340698]]
+    """
+
+    conn_name_attr = 'trino_conn_id'
+    default_conn_name = 'trino_default'
+    conn_type = 'trino'
+    hook_name = 'Trino'
+
+    def get_conn(self) -> Connection:
+        """Returns a connection object"""
+        db = self.get_connection(
+            self.trino_conn_id  # type: ignore[attr-defined]  # pylint: disable=no-member
+        )
+        extra = db.extra_dejson
+        auth = None
+        if db.password and extra.get('auth') == 'kerberos':
+            raise AirflowException("Kerberos authorization doesn't support password.")
+        elif db.password:
+            auth = trino.auth.BasicAuthentication(db.login, db.password)
+        elif extra.get('auth') == 'kerberos':
+            auth = trino.auth.KerberosAuthentication(
+                config=extra.get('kerberos__config', os.environ.get('KRB5_CONFIG')),
+                service_name=extra.get('kerberos__service_name'),
+                mutual_authentication=_boolify(extra.get('kerberos__mutual_authentication', False)),
+                force_preemptive=_boolify(extra.get('kerberos__force_preemptive', False)),
+                hostname_override=extra.get('kerberos__hostname_override'),
+                sanitize_mutual_error_response=_boolify(
+                    extra.get('kerberos__sanitize_mutual_error_response', True)
+                ),
+                principal=extra.get('kerberos__principal', conf.get('kerberos', 'principal')),
+                delegate=_boolify(extra.get('kerberos__delegate', False)),
+                ca_bundle=extra.get('kerberos__ca_bundle'),
+            )
+
+        trino_conn = trino.dbapi.connect(
+            host=db.host,
+            port=db.port,
+            user=db.login,
+            source=db.extra_dejson.get('source', 'airflow'),
+            http_scheme=db.extra_dejson.get('protocol', 'http'),
+            catalog=db.extra_dejson.get('catalog', 'hive'),
+            schema=db.schema,
+            auth=auth,
+            isolation_level=self.get_isolation_level(),  # type: ignore[func-returns-value]
+        )
+        if extra.get('verify') is not None:
+            # Unfortunately verify parameter is available via public API.
+            # The PR is merged in the trino library, but has not been released.
+            # See: https://github.com/trinosql/trino-python-client/pull/31

Review comment:
       ```suggestion
               # Unfortunately verify parameter is available via public API.
               # The PR is merged in the trino library, but has not been released.
               # See: https://github.com/trinodb/trino-python-client/pull/31
   ```
   BTW this PR is merged and released

##########
File path: airflow/providers/qubole/operators/qubole.py
##########
@@ -87,6 +87,10 @@ class QuboleOperator(BaseOperator):
             :query: inline query statement
             :script_location: s3 location containing query statement
             :macros: macro values which were used in query
+        prestocmd:
+            :query: inline query statement
+            :script_location: s3 location containing query statement
+            :macros: macro values which were used in query

Review comment:
       This is duplication of lines 86-89
   It doesn't seem to have equivalent `command_type`  for trino
   https://docs.qubole.com/en/latest/rest-api/airflow_api/qubole-operator-api.html




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dungdm93 commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
dungdm93 commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r606895250



##########
File path: scripts/ci/docker-compose/integration-trino.yml
##########
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04

Review comment:
       I'd like to recommend use official trino image and mount `entrypoint.sh` at runtime instead of build custom image:
   ```yaml
     trino:
       image: trinodb/trino:354
       entrypoint: ["/usr/local/bin/entrypoint.sh"]
       command: ["/usr/lib/trino/bin/run-trino"]
       volumes:
       - ../dockerfiles/trino/entrypoint.sh:/usr/local/bin/entrypoint.sh:ro
   ```
   It's much easier to understand and much easier to upgrade :100: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813466426


   [The Workflow run](https://github.com/apache/airflow/actions/runs/719636407) is cancelling this PR. Building images for the PR has failed. Follow the workflow link to check the reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#discussion_r608038303



##########
File path: scripts/ci/docker-compose/integration-redis.yml
##########
@@ -21,7 +21,7 @@ services:
     image: redis:5.0.1
     volumes:
       - /dev/urandom:/dev/random   # Required to get non-blocking entropy source
-      - redis-db-volume:/data/presto
+      - redis-db-volume:/data/redis

Review comment:
       Just a fix on the way. tt has no real consequences other than redis data will be separated out during testing from presto data :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813100723


   [The Workflow run](https://github.com/apache/airflow/actions/runs/717368051) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813473087


   > It is true. You have the correct code, but the cache contains outdated information. To prevent such problems, we build documentation twice in some situations (see: [#12527 (comment)](https://github.com/apache/airflow/pull/12527#issuecomment-731568418) and .
   
   Thanks! Completely forgot about it :). Thanks! Now I have to deal with numpy/pandask/dask dependencies that dropped support for python 3.6 (and some new versions were released today)  but I will deal with in a separate PR shortly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-813616002


   Now we have other error messages. I suspect that now all artifacts may not be shared between different containers because when I build the documentation locally I don't have this problem.  Let's wait until we delete Docker magic from the documentation build process and then maybe everything will start working.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15187: Adds 'Trino' provider (with lower memory footprint for tests)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15187:
URL: https://github.com/apache/airflow/pull/15187#issuecomment-814028392


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org