You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by as...@apache.org on 2021/04/15 12:07:19 UTC

[airflow] branch v2-0-test updated (ef876cf -> f769e81)

This is an automated email from the ASF dual-hosted git repository.

ash pushed a change to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git.


 discard ef876cf  Import Connection lazily in hooks to avoid cycles (#15361)
 discard 7b9f091  Fix missing on_load trigger for folder-based plugins (#15208)
 discard 7e28c97  Change default of `[kubernetes] enable_tcp_keepalive` to `True` (#15338)
 discard 315005b  BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods (#15204)
 discard cfeeb14  Fix password masking in CLI action_logging (#15143)
 discard dc9bf44  Fix url generation for TriggerDagRunOperatorLink (#14990)
 discard cf3de8f  Add documentation create/update community providers (#15061)
 discard 894c646  Restore base lineage backend (#14146)
 discard 3634738  Unable to trigger backfill or manual jobs with Kubernetes executor. (#14160)
 discard 9eb0e13  Bugfix: Task docs are not shown in the Task Instance Detail View (#15191)
 discard ddf306b  Fix mistake and typos in doc/docstrings (#15180)
 discard 97bad2c  Update import path and fix typo in `dag-run.rst` (#15201)
 discard 20ed260  Bugfix: Fix overriding `pod_template_file` in KubernetesExecutor (#15197)
 discard 57a8afd  Fixed #14270: Add error message in OOM situations (#15207)
 discard 4aea2ee  Add new Committers to docs (#15235)
 discard 6cf185e  Add new committers (#14544)
 discard 772c33f  Sort Committers via their names instead of usernames (#14403)
 discard 987ddca  Add Ephraim to Committers List (#14397)
 discard 1dacb83  Add a note in set-config.rst on using Secrets Backend (#15274)
 discard 0e2f295  Fix Providers doc (#15185)
 discard dda7c5c  Replace new url for Stable Airflow Docs (#15169)
 discard 4bb8876  Bugfix: resources in `executor_config` breaks Graph View in UI (#15199)
 discard f074bcf  Better compatibility/diagnostics for arbitrary UID in docker image (#15162)
 discard 66373f8  Run kubernetes tests in parallel (#15222)
 discard 5d05835  Adds new Airbyte provider (#14492)
 discard 984bb4c  Adds 'Trino' provider (with lower memory footprint for tests) (#15187)
 discard 144255f  Constraints are now parallelized and merged in single job (#15211)
 discard 34b3e96  Fix celery executor bug trying to call len on map (#14883)
 discard 5ce558c  Less docker magic in docs building (#15176)
 discard 752191d  not fail on missing status in tests
 discard 5f5a914  Updates 3.6 limits for latest versions of a few libraries (#15209)
 discard e87cd1f  Merges quarantined tests into single job (#15153)
 discard 3d17216  Removes unused CI feature of printing output on error (#15190)
 discard 25caba7  Fixes problem when Pull Request is `weird` - has null head_repo (#15189)
 discard 3e9633e  Bump K8S versions to latest supported ones. (#15156)
 discard 65c3ecf  Adds Blinker dependency which is missing after recent changes (#15182)
     new 2ae87ec  Adds Blinker dependency which is missing after recent changes (#15182)
     new cf41de4  Bump K8S versions to latest supported ones. (#15156)
     new 6f4e134  Fixes problem when Pull Request is `weird` - has null head_repo (#15189)
     new 1c04642  Removes unused CI feature of printing output on error (#15190)
     new ca9dba0  Merges quarantined tests into single job (#15153)
     new 7ff9b8c  Updates 3.6 limits for latest versions of a few libraries (#15209)
     new f473309  not fail on missing status in tests
     new b61feb6  Less docker magic in docs building (#15176)
     new 22b2a80  Fix celery executor bug trying to call len on map (#14883)
     new a51707a  Constraints are now parallelized and merged in single job (#15211)
     new 0558900  Adds 'Trino' provider (with lower memory footprint for tests) (#15187)
     new a05247b  Adds new Airbyte provider (#14492)
     new 14f7116  Run kubernetes tests in parallel (#15222)
     new 6860da4  Better compatibility/diagnostics for arbitrary UID in docker image (#15162)
     new b622915  Bugfix: resources in `executor_config` breaks Graph View in UI (#15199)
     new b96a8bf  Replace new url for Stable Airflow Docs (#15169)
     new 1f7e364  Fix Providers doc (#15185)
     new 24503ca  Add a note in set-config.rst on using Secrets Backend (#15274)
     new b95155d  Add Ephraim to Committers List (#14397)
     new 8d902cb  Sort Committers via their names instead of usernames (#14403)
     new 4ce74f8  Add new committers (#14544)
     new ab570a0  Add new Committers to docs (#15235)
     new 8adb126  Fixed #14270: Add error message in OOM situations (#15207)
     new 18a153c  Bugfix: Fix overriding `pod_template_file` in KubernetesExecutor (#15197)
     new 863250d  Update import path and fix typo in `dag-run.rst` (#15201)
     new 6b0421b  Fix mistake and typos in doc/docstrings (#15180)
     new c8d1c67  Bugfix: Task docs are not shown in the Task Instance Detail View (#15191)
     new 9b4b356  Unable to trigger backfill or manual jobs with Kubernetes executor. (#14160)
     new ac00eab  Restore base lineage backend (#14146)
     new 096435e  Add documentation create/update community providers (#15061)
     new 68f5b40  Fix url generation for TriggerDagRunOperatorLink (#14990)
     new 84d305f  Fix password masking in CLI action_logging (#15143)
     new 110adfe  BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods (#15204)
     new def23a0  Change default of `[kubernetes] enable_tcp_keepalive` to `True` (#15338)
     new 683005a  Fix missing on_load trigger for folder-based plugins (#15208)
     new f769e81  Import Connection lazily in hooks to avoid cycles (#15361)

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (ef876cf)
            \
             N -- N -- N   refs/heads/v2-0-test (f769e81)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 36 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 setup.cfg | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[airflow] 19/36: Add Ephraim to Committers List (#14397)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit b95155d793b7daec9dba7cca7ea27ae88d62bdc2
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Tue Feb 23 20:40:32 2021 +0000

    Add Ephraim to Committers List (#14397)
    
    https://lists.apache.org/thread.html/r4d6f50e0790c02a28da41066b97bf7b46ddda99bc2949de08647b2df%40%3Cdev.airflow.apache.org%3E
    (cherry picked from commit 4c35955bb93ec8276d3b9b3c5994c4be6f96f907)
---
 docs/apache-airflow/project.rst | 1 +
 docs/spelling_wordlist.txt      | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/docs/apache-airflow/project.rst b/docs/apache-airflow/project.rst
index 6b1cd7a..216b927 100644
--- a/docs/apache-airflow/project.rst
+++ b/docs/apache-airflow/project.rst
@@ -45,6 +45,7 @@ Committers
 - @bolkedebruin (Bolke de Bruin)
 - @criccomini (Chris Riccomini)
 - @dimberman (Daniel Imberman)
+- @ephraimbuddy (Ephraim Anierobi)
 - @feluelle (Felix Uellendall)
 - @feng-tao (Tao Feng)
 - @fokko (Fokko Driesprong)
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
index 2ebb5d1..34506e9 100644
--- a/docs/spelling_wordlist.txt
+++ b/docs/spelling_wordlist.txt
@@ -10,6 +10,7 @@ AnalyzeEntitiesResponse
 AnalyzeSentimentResponse
 AnalyzeSyntaxResponse
 Anand
+Anierobi
 AnnotateTextResponse
 Ansible
 AppBuilder
@@ -127,6 +128,7 @@ EmrCreateJobFlow
 Enum
 Env
 EnvVar
+Ephraim
 ExaConnection
 Exasol
 Failover
@@ -712,6 +714,7 @@ env
 envFrom
 envvar
 eols
+ephraimbuddy
 errno
 eslint
 etl

[airflow] 02/36: Bump K8S versions to latest supported ones. (#15156)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit cf41de43b06b95d085d60576b944c20061979245
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Sun Apr 4 15:08:18 2021 +0200

    Bump K8S versions to latest supported ones. (#15156)
    
    K8S has a one-year support policy. This PR updates the
    K8S versions we use to test to the latest available in three
    supported versions of K8S as of now: 1.20, 1.19. 18.
    
    The 1.16 and 1.17 versions are not supported any more as of today.
    
    https://en.wikipedia.org/wiki/Kubernetes
    
    This change also bumps kind to latest version (we use kind for
    K8S testing) and fixes configuration to match this version.
    
    (cherry picked from commit 36ab9dd7c4188278068c9b8c280d874760f02c5b)
---
 BREEZE.rst                                   |  8 ++++----
 README.md                                    |  2 +-
 breeze-complete                              |  4 ++--
 docs/apache-airflow/installation.rst         |  2 +-
 scripts/ci/kubernetes/kind-cluster-conf.yaml | 15 ++++-----------
 scripts/ci/libraries/_initialization.sh      |  4 ++--
 6 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/BREEZE.rst b/BREEZE.rst
index 293cb37..2a8a74a 100644
--- a/BREEZE.rst
+++ b/BREEZE.rst
@@ -2485,17 +2485,17 @@ This is the current syntax for  `./breeze <./breeze>`_:
           Kubernetes version - only used in case one of kind-cluster commands is used.
           One of:
 
-                 v1.18.6 v1.17.5 v1.16.9
+                 v1.20.2 v1.19.7 v1.18.15
 
-          Default: v1.18.6
+          Default: v1.20.2
 
   --kind-version KIND_VERSION
           Kind version - only used in case one of kind-cluster commands is used.
           One of:
 
-                 v0.8.0
+                 v0.10.0
 
-          Default: v0.8.0
+          Default: v0.10.0
 
   --helm-version HELM_VERSION
           Helm version - only used in case one of kind-cluster commands is used.
diff --git a/README.md b/README.md
index 7385ed2..0270131 100644
--- a/README.md
+++ b/README.md
@@ -84,7 +84,7 @@ Apache Airflow is tested with:
 | PostgreSQL   | 9.6, 10, 11, 12, 13       | 9.6, 10, 11, 12, 13      | 9.6, 10, 11, 12, 13        |
 | MySQL        | 5.7, 8                    | 5.7, 8                   | 5.6, 5.7                   |
 | SQLite       | 3.15.0+                   | 3.15.0+                  | 3.15.0+                    |
-| Kubernetes   | 1.16.9, 1.17.5, 1.18.6    | 1.16.9, 1.17.5, 1.18.6   | 1.16.9, 1.17.5, 1.18.6     |
+| Kubernetes   | 1.20, 1.19, 1.18          | 1.20, 1.19, 1.18         | 1.18, 1.17, 1.16           |
 
 **Note:** MySQL 5.x versions are unable to or have limitations with
 running multiple schedulers -- please see the "Scheduler" docs. MariaDB is not tested/recommended.
diff --git a/breeze-complete b/breeze-complete
index a75b267..83dfe9f 100644
--- a/breeze-complete
+++ b/breeze-complete
@@ -30,9 +30,9 @@ _breeze_allowed_generate_constraints_modes="source-providers pypi-providers no-p
 # registrys is good here even if it is not correct english. We are adding s automatically to all variables
 _breeze_allowed_github_registrys="docker.pkg.github.com ghcr.io"
 _breeze_allowed_kubernetes_modes="image"
-_breeze_allowed_kubernetes_versions="v1.18.6 v1.17.5 v1.16.9"
+_breeze_allowed_kubernetes_versions="v1.20.2 v1.19.7 v1.18.15"
 _breeze_allowed_helm_versions="v3.2.4"
-_breeze_allowed_kind_versions="v0.8.0"
+_breeze_allowed_kind_versions="v0.10.0"
 _breeze_allowed_mysql_versions="5.7 8"
 _breeze_allowed_postgres_versions="9.6 10 11 12 13"
 _breeze_allowed_kind_operations="start stop restart status deploy test shell k9s"
diff --git a/docs/apache-airflow/installation.rst b/docs/apache-airflow/installation.rst
index 0184216..a348334 100644
--- a/docs/apache-airflow/installation.rst
+++ b/docs/apache-airflow/installation.rst
@@ -42,7 +42,7 @@ Airflow is tested with:
   * MySQL: 5.7, 8
   * SQLite: 3.15.0+
 
-* Kubernetes: 1.16.9, 1.17.5, 1.18.6
+* Kubernetes: 1.18.15 1.19.7 1.20.2
 
 **Note:** MySQL 5.x versions are unable to or have limitations with
 running multiple schedulers -- please see: :doc:`/scheduler`. MariaDB is not tested/recommended.
diff --git a/scripts/ci/kubernetes/kind-cluster-conf.yaml b/scripts/ci/kubernetes/kind-cluster-conf.yaml
index df60820..f03c1b7 100644
--- a/scripts/ci/kubernetes/kind-cluster-conf.yaml
+++ b/scripts/ci/kubernetes/kind-cluster-conf.yaml
@@ -16,9 +16,10 @@
 # under the License.
 ---
 kind: Cluster
-apiVersion: kind.sigs.k8s.io/v1alpha3
+apiVersion: kind.x-k8s.io/v1alpha4
 networking:
-  apiServerAddress: 0.0.0.0
+  ipFamily: ipv4
+  apiServerAddress: "127.0.0.1"
   apiServerPort: 19090
 nodes:
   - role: control-plane
@@ -26,13 +27,5 @@ nodes:
     extraPortMappings:
       - containerPort: 30007
         hostPort: 8080
-        listenAddress: "0.0.0.0"
+        listenAddress: "127.0.0.1"
         protocol: TCP
-kubeadmConfigPatchesJson6902:
-  - group: kubeadm.k8s.io
-    version: v1beta2
-    kind: ClusterConfiguration
-    patch: |
-      - op: add
-        path: /apiServer/certSANs/-
-        value: docker
diff --git a/scripts/ci/libraries/_initialization.sh b/scripts/ci/libraries/_initialization.sh
index cb42693..f924962 100644
--- a/scripts/ci/libraries/_initialization.sh
+++ b/scripts/ci/libraries/_initialization.sh
@@ -476,13 +476,13 @@ function initialization::initialize_provider_package_building() {
 # Determine versions of kubernetes cluster and tools used
 function initialization::initialize_kubernetes_variables() {
     # Currently supported versions of Kubernetes
-    CURRENT_KUBERNETES_VERSIONS+=("v1.18.6" "v1.17.5" "v1.16.9")
+    CURRENT_KUBERNETES_VERSIONS+=("v1.20.2" "v1.19.7" "v1.18.15")
     export CURRENT_KUBERNETES_VERSIONS
     # Currently supported modes of Kubernetes
     CURRENT_KUBERNETES_MODES+=("image")
     export CURRENT_KUBERNETES_MODES
     # Currently supported versions of Kind
-    CURRENT_KIND_VERSIONS+=("v0.8.0")
+    CURRENT_KIND_VERSIONS+=("v0.10.0")
     export CURRENT_KIND_VERSIONS
     # Currently supported versions of Helm
     CURRENT_HELM_VERSIONS+=("v3.2.4")

[airflow] 10/36: Constraints are now parallelized and merged in single job (#15211)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit a51707a956c4ec61c3b608f9e5a84caf1267bddf
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Tue Apr 6 04:08:11 2021 +0200

    Constraints are now parallelized and merged in single job (#15211)
    
    Originally, the constraints were generated in separate jobs and uploaded as
    artifacts and then joined be a separate push job. Thanks to parallel
    processing, we can now do that all in a single job, with both cost and
    time savings.
    
    (cherry picked from commit aebacd74058d01cfecaf913c04c0dbc50bb188ea)
---
 .github/workflows/ci.yml                           | 64 ++++++----------------
 BREEZE.rst                                         | 39 ++++++-------
 CONTRIBUTING.rst                                   | 19 +++++--
 scripts/ci/constraints/ci_commit_constraints.sh    |  2 +-
 .../ci_generate_all_constraints.sh}                | 13 ++++-
 scripts/ci/constraints/ci_generate_constraints.sh  |  8 +++
 .../images/ci_wait_for_and_verify_all_ci_images.sh |  5 +-
 .../ci_wait_for_and_verify_all_prod_images.sh      |  4 +-
 scripts/ci/libraries/_parallel.sh                  | 13 +++++
 9 files changed, 89 insertions(+), 78 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 86bc960..21a5429 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -1114,17 +1114,21 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
     timeout-minutes: 10
     name: "Constraints"
     runs-on: ${{ fromJson(needs.build-info.outputs.runsOn) }}
-    strategy:
-      matrix:
-        python-version: ${{ fromJson(needs.build-info.outputs.pythonVersions) }}
-      fail-fast: false
     needs:
       - build-info
       - ci-images
+      - prod-images
+      - static-checks
+      - static-checks-pylint
+      - tests-sqlite
+      - tests-mysql
+      - tests-postgres
+      - tests-kubernetes
     env:
       RUNS_ON: ${{ fromJson(needs.build-info.outputs.runsOn) }}
       PYTHON_MAJOR_MINOR_VERSION: ${{ matrix.python-version }}
       GITHUB_REGISTRY: ${{ needs.ci-images.outputs.githubRegistry }}
+      CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING: ${{needs.build-info.outputs.pythonVersionsListAsString}}
     # Only run it for direct pushes
     if: >
       github.ref == 'refs/heads/master' || github.ref == 'refs/heads/v1-10-test' ||
@@ -1140,54 +1144,22 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
           python-version: ${{ env.PYTHON_MAJOR_MINOR_VERSION }}
       - name: "Free space"
         run: ./scripts/ci/tools/ci_free_space_on_ci.sh
-      - name: "Prepare CI image ${{env.PYTHON_MAJOR_MINOR_VERSION}}:${{ github.sha }}"
-        run: ./scripts/ci/images/ci_prepare_ci_image_on_ci.sh
+      - name: >
+          Wait for CI images
+          ${{ needs.build-info.outputs.pythonVersions }}:${{ env.GITHUB_REGISTRY_PULL_IMAGE_TAG }}
+        run: ./scripts/ci/images/ci_wait_for_and_verify_all_ci_images.sh
       - name: "Generate constraints with PyPI providers"
-        run: ./scripts/ci/constraints/ci_generate_constraints.sh
+        run: ./scripts/ci/constraints/ci_generate_all_constraints.sh
         env:
           GENERATE_CONSTRAINTS_MODE: "pypi-providers"
       - name: "Generate constraints with source providers"
-        run: ./scripts/ci/constraints/ci_generate_constraints.sh
+        run: ./scripts/ci/constraints/ci_generate_all_constraints.sh
         env:
           GENERATE_CONSTRAINTS_MODE: "source-providers"
       - name: "Generate constraints without providers"
-        run: ./scripts/ci/constraints/ci_generate_constraints.sh
+        run: ./scripts/ci/constraints/ci_generate_all_constraints.sh
         env:
           GENERATE_CONSTRAINTS_MODE: "no-providers"
-      - name: "Upload constraint artifacts"
-        uses: actions/upload-artifact@v2
-        with:
-          name: 'constraints-${{matrix.python-version}}'
-          path: './files/constraints-${{matrix.python-version}}/constraints-*${{matrix.python-version}}.txt'
-          retention-days: 7
-
-  constraints-push:
-    timeout-minutes: 10
-    name: "Constraints push"
-    runs-on: ${{ fromJson(needs.build-info.outputs.runsOn) }}
-    needs:
-      - build-info
-      - constraints
-      - ci-images
-      - prod-images
-      - static-checks
-      - static-checks-pylint
-      - tests-sqlite
-      - tests-mysql
-      - tests-postgres
-      - tests-kubernetes
-    # Only run it for direct pushes
-    if: >
-      github.ref == 'refs/heads/master' || github.ref == 'refs/heads/v1-10-test' ||
-      github.ref == 'refs/heads/v2-0-test'
-    env:
-      RUNS_ON: ${{ fromJson(needs.build-info.outputs.runsOn) }}
-    steps:
-      - name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
-        uses: actions/checkout@v2
-        with:
-          persist-credentials: false
-          submodules: recursive
       - name: "Set constraints branch name"
         id: constraints-branch
         run: ./scripts/ci/constraints/ci_branch_constraints.sh
@@ -1197,10 +1169,6 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
           path: "repo"
           ref: ${{ steps.constraints-branch.outputs.branch }}
           persist-credentials: false
-      - name: "Get all artifacts (constraints)"
-        uses: actions/download-artifact@v2
-        with:
-          path: 'artifacts'
       - name: "Commit changed constraint files for ${{needs.build-info.outputs.pythonVersions}}"
         run: ./scripts/ci/constraints/ci_commit_constraints.sh
       - name: "Push changes"
@@ -1223,7 +1191,7 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
       - tests-postgres
       - tests-mysql
       - tests-kubernetes
-      - constraints-push
+      - constraints
       - prepare-test-provider-packages-wheel
       - prepare-test-provider-packages-sdist
     if: github.event_name == 'schedule' &&  github.repository == 'apache/airflow'
diff --git a/BREEZE.rst b/BREEZE.rst
index 2a8a74a..72633e8 100644
--- a/BREEZE.rst
+++ b/BREEZE.rst
@@ -809,38 +809,39 @@ Generating constraints
 
 Whenever setup.py gets modified, the CI master job will re-generate constraint files. Those constraint
 files are stored in separated orphan branches: ``constraints-master``, ``constraints-2-0``
-and ``constraints-1-10``. They are stored separately for each python version and there are separate
-constraints for:
+and ``constraints-1-10``.
+
+Those are constraint files as described in detail in the
+`<CONTRIBUTING.rst#pinned-constraint-files>`_ contributing documentation.
+
+You can use ``./breeze generate-constraints`` command to manually generate constraints for a single python
+version and single constraint mode like this:
+
+.. code-block:: bash
+
+     ./breeze generate-constraints --generate-constraints-mode pypi-providers
+
+
+Constraints are generated separately for each python version and there are separate constraints modes:
 
 * 'constraints' - those are constraints generated by matching the current airflow version from sources
    and providers that are installed from PyPI. Those are constraints used by the users who want to
-   install airflow with pip
+   install airflow with pip. Use ``pypi-providers`` mode for that.
 
 * "constraints-source-providers" - those are constraints generated by using providers installed from
   current sources. While adding new providers their dependencies might change, so this set of providers
   is the current set of the constraints for airflow and providers from the current master sources.
-  Those providers are used by CI system to keep "stable" set of constraints.
+  Those providers are used by CI system to keep "stable" set of constraints. Use
+  ``source-providers`` mode for that.
 
 * "constraints-no-providers" - those are constraints generated from only Apache Airflow, without any
   providers. If you want to manage airflow separately and then add providers individually, you can
-  use those.
-
-Those are constraint files as described in detail in the
-`<CONTRIBUTING.rst#pinned-constraint-files>`_ contributing documentation.
+  use those. Use ``no-providers`` mode for that.
 
 In case someone modifies setup.py, the ``CRON`` scheduled CI build automatically upgrades and
 pushes changed to the constraint files, however you can also perform test run of this locally using
-``generate-constraints`` command of Breeze.
-
-.. code-block:: bash
-
-  for python_version in 3.6 3.7 3.8
-  do
-    ./breeze generate-constraints --generate-constraints-mode source-providers --python ${python_version}
-    ./breeze generate-constraints --generate-constraints-mode pypi-providers --python ${python_version}
-    ./breeze generate-constraints --generate-constraints-mode no-providers --python ${python_version}
-  done
-
+the procedure described in `<CONTRIBUTING.rst#mnully-generating-constraint-files>`_ which utilises
+multiple processors on your local machine to generate such constraints faster.
 
 This bumps the constraint files to latest versions and stores hash of setup.py. The generated constraint
 and setup.py hash files are stored in the ``files`` folder and while generating the constraints diff
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
index 6673ddf..19c4077 100644
--- a/CONTRIBUTING.rst
+++ b/CONTRIBUTING.rst
@@ -871,19 +871,26 @@ Manually generating constraint files
 ------------------------------------
 
 The constraint files are generated automatically by the CI job. Sometimes however it is needed to regenerate
-them manually (committers only). For example when master build did not succeed for quite some time). This can be done by
-running this:
+them manually (committers only). For example when master build did not succeed for quite some time).
+This can be done by running this (it utilizes parallel preparation of the constraints):
 
 .. code-block:: bash
 
-    for python_version in 3.6 3.7 3.8
+    export CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING="3.6 3.7 3.8"
+    for python_version in $(echo "${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}")
     do
-      ./breeze generate-constraints --generate-constraints-mode source-providers --python ${python_version} --build-cache-local
-      ./breeze generate-constraints --generate-constraints-mode pypi-providers --python ${python_version} --build-cache-local
-      ./breeze generate-constraints --generate-constraints-mode no-providers --python ${python_version} --build-cache-local
+      ./breeze build-image --upgrade-to-newer-dependencies --python ${python_version} --build-cache-local
+      ./breeze build-image --upgrade-to-newer-dependencies --python ${python_version} --build-cache-local
+      ./breeze build-image --upgrade-to-newer-dependencies --python ${python_version} --build-cache-local
     done
+
+    GENERATE_CONSTRAINTS_MODE="pypi-providers" ./scripts/ci/constraints/ci_generate_all_constraints.sh
+    GENERATE_CONSTRAINTS_MODE="source-providers" ./scripts/ci/constraints/ci_generate_all_constraints.sh
+    GENERATE_CONSTRAINTS_MODE="no-providers" ./scripts/ci/constraints/ci_generate_all_constraints.sh
+
     AIRFLOW_SOURCES=$(pwd)
 
+
 The constraints will be generated in "files/constraints-PYTHON_VERSION/constraints-*.txt files. You need to
 checkout the right 'constraints-' branch in a separate repository and then you can copy, commit and push the
 generated files:
diff --git a/scripts/ci/constraints/ci_commit_constraints.sh b/scripts/ci/constraints/ci_commit_constraints.sh
index 7c24dc5..c3a7521 100755
--- a/scripts/ci/constraints/ci_commit_constraints.sh
+++ b/scripts/ci/constraints/ci_commit_constraints.sh
@@ -18,7 +18,7 @@
 # shellcheck source=scripts/ci/libraries/_script_init.sh
 . "$( dirname "${BASH_SOURCE[0]}" )/../libraries/_script_init.sh"
 
-cp -v ./artifacts/constraints-*/constraints*.txt repo/
+cp -v ./files/constraints-*/constraints*.txt repo/
 cd repo || exit 1
 git config --local user.email "dev@airflow.apache.org"
 git config --local user.name "Automated GitHub Actions commit"
diff --git a/scripts/ci/images/ci_wait_for_and_verify_all_ci_images.sh b/scripts/ci/constraints/ci_generate_all_constraints.sh
similarity index 75%
copy from scripts/ci/images/ci_wait_for_and_verify_all_ci_images.sh
copy to scripts/ci/constraints/ci_generate_all_constraints.sh
index 7e09b1c..9a7a77e 100755
--- a/scripts/ci/images/ci_wait_for_and_verify_all_ci_images.sh
+++ b/scripts/ci/constraints/ci_generate_all_constraints.sh
@@ -17,6 +17,7 @@
 # under the License.
 set -euo pipefail
 
+
 # We cannot perform full initialization because it will be done later in the "single run" scripts
 # And some readonly variables are set there, therefore we only selectively reuse parallel lib needed
 LIBRARIES_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")/../libraries/" && pwd)
@@ -25,10 +26,18 @@ source "${LIBRARIES_DIR}/_all_libs.sh"
 
 initialization::set_output_color_variables
 
+export CHECK_IMAGE_FOR_REBUILD="false"
+echo
+echo "${COLOR_YELLOW}Skip rebuilding CI images. Assume the one we have is good!${COLOR_RESET}"
+echo "${COLOR_YELLOW}You must run './breeze build-image --upgrade-to-newer-dependencies before for all python versions before running this one!${COLOR_RESET}"
+echo
+
 parallel::make_sure_gnu_parallel_is_installed
 
+parallel::make_sure_python_versions_are_specified
+
 echo
-echo "Waiting for all CI images to appear: ${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}"
+echo "${COLOR_BLUE}Generating all constraint files${COLOR_RESET}"
 echo
 
 parallel::initialize_monitoring
@@ -37,5 +46,5 @@ parallel::monitor_progress
 
 # shellcheck disable=SC2086
 parallel --results "${PARALLEL_MONITORED_DIR}" \
-    "$( dirname "${BASH_SOURCE[0]}" )/ci_wait_for_and_verify_ci_image.sh" ::: \
+    "$( dirname "${BASH_SOURCE[0]}" )/ci_generate_constraints.sh" ::: \
     ${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}
diff --git a/scripts/ci/constraints/ci_generate_constraints.sh b/scripts/ci/constraints/ci_generate_constraints.sh
index 10a4107..7e1cefa 100755
--- a/scripts/ci/constraints/ci_generate_constraints.sh
+++ b/scripts/ci/constraints/ci_generate_constraints.sh
@@ -15,6 +15,14 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+if [[ $1 == "" ]]; then
+  >&2 echo "Requires python MAJOR/MINOR version as first parameter"
+  exit 1
+fi
+
+export PYTHON_MAJOR_MINOR_VERSION=$1
+shift
+
 # shellcheck source=scripts/ci/libraries/_script_init.sh
 . "$( dirname "${BASH_SOURCE[0]}" )/../libraries/_script_init.sh"
 
diff --git a/scripts/ci/images/ci_wait_for_and_verify_all_ci_images.sh b/scripts/ci/images/ci_wait_for_and_verify_all_ci_images.sh
index 7e09b1c..4255374 100755
--- a/scripts/ci/images/ci_wait_for_and_verify_all_ci_images.sh
+++ b/scripts/ci/images/ci_wait_for_and_verify_all_ci_images.sh
@@ -27,10 +27,13 @@ initialization::set_output_color_variables
 
 parallel::make_sure_gnu_parallel_is_installed
 
+parallel::make_sure_python_versions_are_specified
+
 echo
-echo "Waiting for all CI images to appear: ${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}"
+echo "${COLOR_BLUE}Waiting for all CI images to appear${COLOR_RESET}"
 echo
 
+
 parallel::initialize_monitoring
 
 parallel::monitor_progress
diff --git a/scripts/ci/images/ci_wait_for_and_verify_all_prod_images.sh b/scripts/ci/images/ci_wait_for_and_verify_all_prod_images.sh
index 2d1da54..08ed54b 100755
--- a/scripts/ci/images/ci_wait_for_and_verify_all_prod_images.sh
+++ b/scripts/ci/images/ci_wait_for_and_verify_all_prod_images.sh
@@ -27,8 +27,10 @@ initialization::set_output_color_variables
 
 parallel::make_sure_gnu_parallel_is_installed
 
+parallel::make_sure_python_versions_are_specified
+
 echo
-echo "Waiting for all PROD images to appear: ${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}"
+echo "${COLOR_BLUE}Waiting for all PROD images to appear${COLOR_RESET}"
 echo
 
 parallel::initialize_monitoring
diff --git a/scripts/ci/libraries/_parallel.sh b/scripts/ci/libraries/_parallel.sh
index e2f8ad4..7239e82 100644
--- a/scripts/ci/libraries/_parallel.sh
+++ b/scripts/ci/libraries/_parallel.sh
@@ -193,3 +193,16 @@ function parallel::cleanup_runner() {
     parallel::kill_stale_semaphore_locks
     start_end::group_end
 }
+
+
+function parallel::make_sure_python_versions_are_specified() {
+    if [[ -z "${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING=}" ]]; then
+        echo
+        echo "${COLOR_RED}The CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING variable must be set and list python versions to use!${COLOR_RESET}"
+        echo
+        exit 1
+    fi
+    echo
+    echo "${COLOR_BLUE}Running parallel builds for those Python versions: ${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}!${COLOR_RESET}"
+    echo
+}

[airflow] 13/36: Run kubernetes tests in parallel (#15222)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 14f71164485f48310e37b9127a7fabab618c1ec6
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Wed Apr 7 20:08:00 2021 +0200

    Run kubernetes tests in parallel (#15222)
    
    (cherry picked from commit ea0710edc106a9091d666a3be629201ad7cbbcad)
---
 .github/workflows/ci.yml                           |  61 ++++++------
 .gitignore                                         |   2 +-
 .rat-excludes                                      |   2 +-
 TESTING.rst                                        |   6 ++
 breeze                                             |   2 +-
 .../ci/images/ci_wait_for_and_verify_ci_image.sh   |   3 +-
 .../ci/images/ci_wait_for_and_verify_prod_image.sh |  17 ++--
 scripts/ci/kubernetes/ci_run_kubernetes_tests.sh   |  10 +-
 ...tup_cluster_and_deploy_airflow_to_kubernetes.sh |   6 +-
 ...cluster_and_run_kubernetes_tests_single_job.sh} |  51 +++++-----
 ...lusters_and_run_kubernetes_tests_in_parallel.sh | 106 +++++++++++++++++++++
 scripts/ci/kubernetes/kind-cluster-conf.yaml       |   4 +-
 scripts/ci/libraries/_build_images.sh              |  23 +----
 scripts/ci/libraries/_docker_engine_resources.sh   |  10 +-
 scripts/ci/libraries/_initialization.sh            |  10 +-
 scripts/ci/libraries/_kind.sh                      |  57 +++++------
 scripts/ci/libraries/_parallel.sh                  |  15 ++-
 scripts/ci/libraries/_testing.sh                   |   2 +-
 scripts/ci/libraries/_verbosity.sh                 |   2 +-
 scripts/ci/selective_ci_checks.sh                  |   2 +
 scripts/ci/testing/ci_run_quarantined_tests.sh     |   3 +
 scripts/in_container/entrypoint_ci.sh              |   4 +-
 22 files changed, 248 insertions(+), 150 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 21a5429..243557f 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -130,6 +130,7 @@ jobs:
       pythonVersionsListAsString: ${{ steps.selective-checks.outputs.python-versions-list-as-string }}
       defaultPythonVersion: ${{ steps.selective-checks.outputs.default-python-version }}
       kubernetesVersions: ${{ steps.selective-checks.outputs.kubernetes-versions }}
+      kubernetesVersionsListAsString: ${{ steps.selective-checks.outputs.kubernetes-versions-list-as-string }}
       defaultKubernetesVersion: ${{ steps.selective-checks.outputs.default-kubernetes-version }}
       kubernetesModes: ${{ steps.selective-checks.outputs.kubernetes-modes }}
       defaultKubernetesMode: ${{ steps.selective-checks.outputs.default-kubernetes-mode }}
@@ -145,7 +146,6 @@ jobs:
       postgresExclude: ${{ steps.selective-checks.outputs.postgres-exclude }}
       mysqlExclude: ${{ steps.selective-checks.outputs.mysql-exclude }}
       sqliteExclude: ${{ steps.selective-checks.outputs.sqlite-exclude }}
-      kubernetesExclude: ${{ steps.selective-checks.outputs.kubernetes-exclude }}
       run-tests: ${{ steps.selective-checks.outputs.run-tests }}
       run-kubernetes-tests: ${{ steps.selective-checks.outputs.run-kubernetes-tests }}
       basic-checks-only: ${{ steps.selective-checks.outputs.basic-checks-only }}
@@ -628,7 +628,7 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
         with:
           name: >
             coverage-helm
-          path: "./files/coverage.xml"
+          path: "./files/coverage*.xml"
           retention-days: 7
 
   tests-postgres:
@@ -686,7 +686,7 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
         with:
           name: >
             coverage-postgres-${{matrix.python-version}}-${{matrix.postgres-version}}
-          path: "./files/coverage.xml"
+          path: "./files/coverage*.xml"
           retention-days: 7
 
   tests-mysql:
@@ -742,7 +742,7 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
         uses: actions/upload-artifact@v2
         with:
           name: coverage-mysql-${{matrix.python-version}}-${{matrix.mysql-version}}
-          path: "./files/coverage.xml"
+          path: "./files/coverage*.xml"
           retention-days: 7
 
   tests-sqlite:
@@ -796,7 +796,7 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
         uses: actions/upload-artifact@v2
         with:
           name: coverage-sqlite-${{matrix.python-version}}
-          path: ./files/coverage.xml
+          path: ./files/coverage*.xml
           retention-days: 7
 
   tests-quarantined:
@@ -867,7 +867,7 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
         uses: actions/upload-artifact@v2
         with:
           name: coverage-quarantined-${{ matrix.backend }}
-          path: "./files/coverage.xml"
+          path: "./files/coverage*.xml"
           retention-days: 7
 
   upload-coverage:
@@ -947,27 +947,22 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
 
   tests-kubernetes:
     timeout-minutes: 50
-    name: K8s ${{matrix.python-version}} ${{matrix.kubernetes-version}} ${{matrix.kubernetes-mode}}
+    name: K8s tests
     runs-on: ${{ fromJson(needs.build-info.outputs.runsOn) }}
     needs: [build-info, prod-images]
-    strategy:
-      matrix:
-        python-version: ${{ fromJson(needs.build-info.outputs.pythonVersions) }}
-        kubernetes-version: ${{ fromJson(needs.build-info.outputs.kubernetesVersions) }}
-        kubernetes-mode: ${{ fromJson(needs.build-info.outputs.kubernetesModes) }}
-        exclude: ${{ fromJson(needs.build-info.outputs.kubernetesExclude) }}
-      fail-fast: false
     env:
       RUNS_ON: ${{ fromJson(needs.build-info.outputs.runsOn) }}
       BACKEND: postgres
       RUN_TESTS: "true"
       RUNTIME: "kubernetes"
-      PYTHON_MAJOR_MINOR_VERSION: "${{ matrix.python-version }}"
-      KUBERNETES_MODE: "${{ matrix.kubernetes-mode }}"
-      KUBERNETES_VERSION: "${{ matrix.kubernetes-version }}"
+      KUBERNETES_MODE: "image"
       KIND_VERSION: "${{ needs.build-info.outputs.defaultKindVersion }}"
       HELM_VERSION: "${{ needs.build-info.outputs.defaultHelmVersion }}"
       GITHUB_REGISTRY: ${{ needs.prod-images.outputs.githubRegistry }}
+      CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING: >
+        ${{needs.build-info.outputs.pythonVersionsListAsString}}
+      CURRENT_KUBERNETES_VERSIONS_AS_STRING: >
+        ${{needs.build-info.outputs.kubernetesVersionsListAsString}}
     if: needs.build-info.outputs.run-kubernetes-tests == 'true'
     steps:
       - name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
@@ -980,45 +975,43 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
           python-version: ${{ needs.build-info.outputs.defaultPythonVersion }}
       - name: "Free space"
         run: ./scripts/ci/tools/ci_free_space_on_ci.sh
-      - name: "Prepare PROD Image"
-        run: ./scripts/ci/images/ci_prepare_prod_image_on_ci.sh
-      - name: "Setup cluster and deploy Airflow"
-        id: setp-cluster-deploy-app
-        run: ./scripts/ci/kubernetes/ci_setup_cluster_and_deploy_airflow_to_kubernetes.sh
-        env:
-          # We have the right image pulled already by the previous step
-          SKIP_BUILDING_PROD_IMAGE: "true"
+      - name: "Get all PROD images"
+        run: ./scripts/ci/images/ci_wait_for_and_verify_all_prod_images.sh
       - name: "Cache virtualenv for kubernetes testing"
         uses: actions/cache@v2
         with:
-          path: ".build/.kubernetes_venv_ ${{ needs.build-info.outputs.defaultPythonVersion }}"
+          path: ".build/.kubernetes_venv"
           key: "kubernetes-${{ needs.build-info.outputs.defaultPythonVersion }}\
+-${{needs.build-info.outputs.kubernetesVersionsListAsString}}
+-${{needs.build-info.outputs.pythonVersionsListAsString}}
 -${{ hashFiles('setup.py','setup.cfg') }}"
-          restore-keys: "kubernetes-${{ needs.build-info.outputs.defaultPythonVersion }}-"
+          restore-keys: "kubernetes-${{ needs.build-info.outputs.defaultPythonVersion }}-\
+-${{needs.build-info.outputs.kubernetesVersionsListAsString}}
+-${{needs.build-info.outputs.pythonVersionsListAsString}}"
       - name: "Cache bin folder with tools for kubernetes testing"
         uses: actions/cache@v2
         with:
-          path: ".build/bin"
-          key: "bin-${{ matrix.kubernetes-version }}\
+          path: ".build/kubernetes-bin"
+          key: "kubernetes-binaries
 -${{ needs.build-info.outputs.defaultKindVersion }}\
 -${{ needs.build-info.outputs.defaultHelmVersion }}"
-          restore-keys: "bin-${{ matrix.kubernetes-version }}"
+          restore-keys: "kubernetes-binaries"
       - name: "Kubernetes Tests"
-        run: ./scripts/ci/kubernetes/ci_run_kubernetes_tests.sh
+        run: ./scripts/ci/kubernetes/ci_setup_clusters_and_run_kubernetes_tests_in_parallel.sh
       - name: "Upload KinD logs"
         uses: actions/upload-artifact@v2
         if: failure()
         with:
           name: >
-            kind-logs-${{matrix.kubernetes-mode}}-${{matrix.python-version}}-${{matrix.kubernetes-version}}
+            kind-logs-
           path: /tmp/kind_logs_*
           retention-days: 7
       - name: "Upload artifact for coverage"
         uses: actions/upload-artifact@v2
         with:
           name: >
-            coverage-k8s-${{matrix.kubernetes-mode}}-${{matrix.python-version}}-${{matrix.kubernetes-version}}
-          path: "./files/coverage.xml"
+            coverage-k8s-
+          path: "./files/coverage*.xml"
           retention-days: 7
 
   push-prod-images-to-github-registry:
diff --git a/.gitignore b/.gitignore
index 5da0b18..0454790 100644
--- a/.gitignore
+++ b/.gitignore
@@ -64,7 +64,7 @@ htmlcov/
 .coverage.*
 .cache
 nosetests.xml
-coverage.xml
+coverage*.xml
 *,cover
 .hypothesis/
 .pytest_cache
diff --git a/.rat-excludes b/.rat-excludes
index 9f467d8..125ed46 100644
--- a/.rat-excludes
+++ b/.rat-excludes
@@ -71,7 +71,7 @@ node_modules/*
 coverage/*
 git_version
 flake8_diff.sh
-coverage.xml
+coverage*.xml
 _sources/*
 
 rat-results.txt
diff --git a/TESTING.rst b/TESTING.rst
index 102cac8..24d1e72 100644
--- a/TESTING.rst
+++ b/TESTING.rst
@@ -629,6 +629,12 @@ Entering shell with Kubernetes Cluster
 This shell is prepared to run Kubernetes tests interactively. It has ``kubectl`` and ``kind`` cli tools
 available in the path, it has also activated virtualenv environment that allows you to run tests via pytest.
 
+The binaries are available in ./.build/kubernetes-bin/``KUBERNETES_VERSION`` path.
+The virtualenv is available in ./.build/.kubernetes_venv/``KIND_CLUSTER_NAME``_host_python_``HOST_PYTHON_VERSION``
+
+Where ``KIND_CLUSTER_NAME`` is the name of the cluster and ``HOST_PYTHON_VERSION`` is the version of python
+in the host.
+
 You can enter the shell via those scripts
 
       ./scripts/ci/kubernetes/ci_run_kubernetes_tests.sh [-i|--interactive]   - Activates virtual environment ready to run tests and drops you in
diff --git a/breeze b/breeze
index 302cebb..df24b14 100755
--- a/breeze
+++ b/breeze
@@ -3418,7 +3418,7 @@ function breeze::run_breeze_command() {
     set +u
     local dc_run_file
     local run_command="breeze::run_command"
-    if [[ ${DRY_RUN_DOCKER} != "false" ]]; then
+    if [[ ${DRY_RUN_DOCKER=} != "false" ]]; then
         run_command="breeze::print_command"
     fi
     if [[ ${PRODUCTION_IMAGE} == "true" ]]; then
diff --git a/scripts/ci/images/ci_wait_for_and_verify_ci_image.sh b/scripts/ci/images/ci_wait_for_and_verify_ci_image.sh
index e83b0d6..29daca7 100755
--- a/scripts/ci/images/ci_wait_for_and_verify_ci_image.sh
+++ b/scripts/ci/images/ci_wait_for_and_verify_ci_image.sh
@@ -31,7 +31,6 @@ shift
 function pull_ci_image() {
     local image_name_with_tag="${GITHUB_REGISTRY_AIRFLOW_CI_IMAGE}:${GITHUB_REGISTRY_PULL_IMAGE_TAG}"
     start_end::group_start "Pulling ${image_name_with_tag} image"
-
     push_pull_remove_images::pull_image_github_dockerhub "${AIRFLOW_CI_IMAGE}" "${image_name_with_tag}"
     start_end::group_end
 
@@ -39,7 +38,9 @@ function pull_ci_image() {
 
 push_pull_remove_images::check_if_github_registry_wait_for_image_enabled
 
+start_end::group_start "Configure Docker Registry"
 build_image::configure_docker_registry
+start_end::group_end
 
 export AIRFLOW_CI_IMAGE_NAME="${BRANCH_NAME}-python${PYTHON_MAJOR_MINOR_VERSION}-ci"
 
diff --git a/scripts/ci/images/ci_wait_for_and_verify_prod_image.sh b/scripts/ci/images/ci_wait_for_and_verify_prod_image.sh
index b4a482a..84c310e 100755
--- a/scripts/ci/images/ci_wait_for_and_verify_prod_image.sh
+++ b/scripts/ci/images/ci_wait_for_and_verify_prod_image.sh
@@ -28,17 +28,11 @@ shift
 # shellcheck source=scripts/ci/libraries/_script_init.sh
 . "$( dirname "${BASH_SOURCE[0]}" )/../libraries/_script_init.sh"
 
-function pull_prod_image() {
-    local image_name_with_tag="${GITHUB_REGISTRY_AIRFLOW_PROD_IMAGE}:${GITHUB_REGISTRY_PULL_IMAGE_TAG}"
-    start_end::group_start "Pulling the ${image_name_with_tag} image and tagging with ${AIRFLOW_PROD_IMAGE}"
-
-    push_pull_remove_images::pull_image_github_dockerhub "${AIRFLOW_PROD_IMAGE}" "${image_name_with_tag}"
-    start_end::group_end
-}
-
 push_pull_remove_images::check_if_github_registry_wait_for_image_enabled
 
+start_end::group_start "Configure Docker Registry"
 build_image::configure_docker_registry
+start_end::group_end
 
 export AIRFLOW_PROD_IMAGE_NAME="${BRANCH_NAME}-python${PYTHON_MAJOR_MINOR_VERSION}"
 
@@ -48,8 +42,11 @@ push_pull_remove_images::wait_for_github_registry_image \
     "${AIRFLOW_PROD_IMAGE_NAME}${GITHUB_REGISTRY_IMAGE_SUFFIX}" "${GITHUB_REGISTRY_PULL_IMAGE_TAG}"
 start_end::group_end
 
+start_end::group_start "Pulling the PROD Image"
 build_images::prepare_prod_build
-
-pull_prod_image
+image_name_with_tag="${GITHUB_REGISTRY_AIRFLOW_PROD_IMAGE}:${GITHUB_REGISTRY_PULL_IMAGE_TAG}"
+verbosity::print_info "Pulling the ${image_name_with_tag} image and tagging with ${AIRFLOW_PROD_IMAGE}"
+push_pull_remove_images::pull_image_github_dockerhub "${AIRFLOW_PROD_IMAGE}" "${image_name_with_tag}"
+start_end::group_end
 
 verify_image::verify_prod_image "${AIRFLOW_PROD_IMAGE}"
diff --git a/scripts/ci/kubernetes/ci_run_kubernetes_tests.sh b/scripts/ci/kubernetes/ci_run_kubernetes_tests.sh
index 65f885c..fbcf66b 100755
--- a/scripts/ci/kubernetes/ci_run_kubernetes_tests.sh
+++ b/scripts/ci/kubernetes/ci_run_kubernetes_tests.sh
@@ -62,7 +62,7 @@ function parse_tests_to_run() {
             "--durations=100"
             "--cov=airflow/"
             "--cov-config=.coveragerc"
-            "--cov-report=xml:files/coverage.xml"
+            "--cov-report=xml:files/coverage=${KIND_CLUSTER_NAME}.xml"
             "--color=yes"
             "--maxfail=50"
             "--pythonwarnings=ignore::DeprecationWarning"
@@ -73,12 +73,12 @@ function parse_tests_to_run() {
 }
 
 function create_virtualenv() {
-    start_end::group_start "Creating virtualenv"
     HOST_PYTHON_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info[0]}.{sys.version_info[1]}")')
     readonly HOST_PYTHON_VERSION
 
-    local virtualenv_path="${BUILD_CACHE_DIR}/.kubernetes_venv_${HOST_PYTHON_VERSION}"
+    local virtualenv_path="${BUILD_CACHE_DIR}/.kubernetes_venv/${KIND_CLUSTER_NAME}_host_python_${HOST_PYTHON_VERSION}"
 
+    mkdir -pv "${BUILD_CACHE_DIR}/.kubernetes_venv/"
     if [[ ! -d ${virtualenv_path} ]]; then
         echo
         echo "Creating virtualenv at ${virtualenv_path}"
@@ -95,14 +95,10 @@ function create_virtualenv() {
 
     pip install -e ".[kubernetes]" \
       --constraint "https://raw.githubusercontent.com/${CONSTRAINTS_GITHUB_REPOSITORY}/${DEFAULT_CONSTRAINTS_BRANCH}/constraints-${HOST_PYTHON_VERSION}.txt"
-
-    start_end::group_end
 }
 
 function run_tests() {
-    start_end::group_start "Running K8S tests"
     pytest "${pytest_args[@]}" "${tests_to_run[@]}"
-    start_end::group_end
 }
 
 cd "${AIRFLOW_SOURCES}" || exit 1
diff --git a/scripts/ci/kubernetes/ci_setup_cluster_and_deploy_airflow_to_kubernetes.sh b/scripts/ci/kubernetes/ci_setup_cluster_and_deploy_airflow_to_kubernetes.sh
index ec493f8..1e0fa36 100755
--- a/scripts/ci/kubernetes/ci_setup_cluster_and_deploy_airflow_to_kubernetes.sh
+++ b/scripts/ci/kubernetes/ci_setup_cluster_and_deploy_airflow_to_kubernetes.sh
@@ -15,9 +15,11 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
+export SKIP_BUILDING_PROD_IMAGE="true"
+
 # shellcheck source=scripts/ci/libraries/_script_init.sh
 . "$( dirname "${BASH_SOURCE[0]}" )/../libraries/_script_init.sh"
-set -euo pipefail
 
 traps::add_trap "kind::dump_kind_logs" EXIT HUP INT TERM
 
@@ -25,7 +27,7 @@ kind::make_sure_kubernetes_tools_are_installed
 kind::get_kind_cluster_name
 kind::perform_kind_cluster_operation "start"
 build_images::prepare_prod_build
-build_images::build_prod_images_with_group
+build_images::build_prod_images
 kind::build_image_for_kubernetes_tests
 kind::load_image_to_kind_cluster
 kind::deploy_airflow_with_helm
diff --git a/scripts/ci/images/ci_wait_for_and_verify_ci_image.sh b/scripts/ci/kubernetes/ci_setup_cluster_and_run_kubernetes_tests_single_job.sh
similarity index 52%
copy from scripts/ci/images/ci_wait_for_and_verify_ci_image.sh
copy to scripts/ci/kubernetes/ci_setup_cluster_and_run_kubernetes_tests_single_job.sh
index e83b0d6..9b0d86f 100755
--- a/scripts/ci/images/ci_wait_for_and_verify_ci_image.sh
+++ b/scripts/ci/kubernetes/ci_setup_cluster_and_run_kubernetes_tests_single_job.sh
@@ -15,43 +15,40 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
 if [[ $1 == "" ]]; then
-  >&2 echo "Requires python MAJOR/MINOR version as first parameter"
+  >&2 echo "Requires Kubernetes_version as first parameter"
   exit 1
 fi
-
-export PYTHON_MAJOR_MINOR_VERSION=$1
+export KUBERNETES_VERSION=$1
 shift
 
 
-# shellcheck source=scripts/ci/libraries/_script_init.sh
-. "$( dirname "${BASH_SOURCE[0]}" )/../libraries/_script_init.sh"
-
-function pull_ci_image() {
-    local image_name_with_tag="${GITHUB_REGISTRY_AIRFLOW_CI_IMAGE}:${GITHUB_REGISTRY_PULL_IMAGE_TAG}"
-    start_end::group_start "Pulling ${image_name_with_tag} image"
-
-    push_pull_remove_images::pull_image_github_dockerhub "${AIRFLOW_CI_IMAGE}" "${image_name_with_tag}"
-    start_end::group_end
-
-}
-
-push_pull_remove_images::check_if_github_registry_wait_for_image_enabled
-
-build_image::configure_docker_registry
+if [[ $1 == "" ]]; then
+  >&2 echo "Requires Python Major/Minor version as second parameter"
+  exit 1
+fi
+export PYTHON_MAJOR_MINOR_VERSION=$1
+shift
 
-export AIRFLOW_CI_IMAGE_NAME="${BRANCH_NAME}-python${PYTHON_MAJOR_MINOR_VERSION}-ci"
+# Requires PARALLEL_JOB_STATUS
 
-start_end::group_start "Waiting for ${AIRFLOW_CI_IMAGE_NAME} image to appear"
+if [[ -z "${PARALLEL_JOB_STATUS=}" ]]; then
+    echo "Needs PARALLEL_JOB_STATUS to be set"
+    exit 1
+fi
 
-push_pull_remove_images::wait_for_github_registry_image \
-    "${AIRFLOW_CI_IMAGE_NAME}${GITHUB_REGISTRY_IMAGE_SUFFIX}" "${GITHUB_REGISTRY_PULL_IMAGE_TAG}"
+echo
+echo "KUBERNETES_VERSION:         ${KUBERNETES_VERSION}"
+echo "PYTHON_MAJOR_MINOR_VERSION: ${PYTHON_MAJOR_MINOR_VERSION}"
+echo
 
-build_images::prepare_ci_build
+# shellcheck source=scripts/ci/libraries/_script_init.sh
+. "$( dirname "${BASH_SOURCE[0]}" )/../libraries/_script_init.sh"
 
-pull_ci_image
+kind::get_kind_cluster_name
+trap 'echo $? > "${PARALLEL_JOB_STATUS}"; kind::perform_kind_cluster_operation "stop"' EXIT HUP INT TERM
 
-verify_image::verify_ci_image "${AIRFLOW_CI_IMAGE}"
+"$( dirname "${BASH_SOURCE[0]}" )/ci_setup_cluster_and_deploy_airflow_to_kubernetes.sh"
 
-start_end::group_end
+export CLUSTER_FORWARDED_PORT="${FORWARDED_PORT_NUMBER}"
+"$( dirname "${BASH_SOURCE[0]}" )/ci_run_kubernetes_tests.sh"
diff --git a/scripts/ci/kubernetes/ci_setup_clusters_and_run_kubernetes_tests_in_parallel.sh b/scripts/ci/kubernetes/ci_setup_clusters_and_run_kubernetes_tests_in_parallel.sh
new file mode 100755
index 0000000..88aa2cd
--- /dev/null
+++ b/scripts/ci/kubernetes/ci_setup_clusters_and_run_kubernetes_tests_in_parallel.sh
@@ -0,0 +1,106 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+set -euo pipefail
+
+# We cannot perform full initialization because it will be done later in the "single run" scripts
+# And some readonly variables are set there, therefore we only selectively reuse parallel lib needed
+LIBRARIES_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")/../libraries/" && pwd)
+# shellcheck source=scripts/ci/libraries/_all_libs.sh
+source "${LIBRARIES_DIR}/_all_libs.sh"
+export SEMAPHORE_NAME="kubernetes-tests"
+
+function get_maximum_parallel_k8s_jobs() {
+    docker_engine_resources::get_available_cpus_in_docker
+    if [[ -n ${RUNS_ON=} && ${RUNS_ON} != *"self-hosted"* ]]; then
+        echo
+        echo "${COLOR_YELLOW}This is a Github Public runner - for now we are forcing max parallel K8S tests jobs to 1 for those${COLOR_RESET}"
+        echo
+        export MAX_PARALLEL_K8S_JOBS="1"
+    else
+        if [[ ${MAX_PARALLEL_K8S_JOBS=} != "" ]]; then
+            echo
+            echo "${COLOR_YELLOW}Maximum parallel k8s jobs forced vi MAX_PARALLEL_K8S_JOBS = ${MAX_PARALLEL_K8S_JOBS}${COLOR_RESET}"
+            echo
+        else
+            MAX_PARALLEL_K8S_JOBS=${CPUS_AVAILABLE_FOR_DOCKER}
+            echo
+            echo "${COLOR_YELLOW}Maximum parallel k8s jobs set to number of CPUs available for Docker = ${MAX_PARALLEL_K8S_JOBS}${COLOR_RESET}"
+            echo
+        fi
+    fi
+    export MAX_PARALLEL_K8S_JOBS
+}
+
+# Launches parallel building of images. Redirects output to log set the right directories
+# $1 - test_specification
+# $2 - bash file to execute in parallel
+function run_kubernetes_test() {
+    local kubernetes_version=$1
+    local python_version=$2
+    local job="Cluster-${kubernetes_version}-python-${python_version}"
+
+    mkdir -p "${PARALLEL_MONITORED_DIR}/${SEMAPHORE_NAME}/${job}"
+    export JOB_LOG="${PARALLEL_MONITORED_DIR}/${SEMAPHORE_NAME}/${job}/stdout"
+    export PARALLEL_JOB_STATUS="${PARALLEL_MONITORED_DIR}/${SEMAPHORE_NAME}/${job}/status"
+    echo "Starting K8S tests for kubernetes version ${kubernetes_version}, python version: ${python_version}"
+    parallel --ungroup --bg --semaphore --semaphorename "${SEMAPHORE_NAME}" \
+        --jobs "${MAX_PARALLEL_K8S_JOBS}" \
+            "$(dirname "${BASH_SOURCE[0]}")/ci_setup_cluster_and_run_kubernetes_tests_single_job.sh" \
+                "${kubernetes_version}" "${python_version}" >"${JOB_LOG}" 2>&1
+}
+
+function run_k8s_tests_in_parallel() {
+    parallel::cleanup_runner
+    start_end::group_start "Monitoring k8s tests"
+    parallel::initialize_monitoring
+    parallel::monitor_progress
+
+    # In case there are more kubernetes versions than strings, we can reuse python versions so we add it twice here
+    local repeated_python_versions
+    # shellcheck disable=SC2206
+    repeated_python_versions=(${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING} ${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING})
+    local index=0
+    for kubernetes_version in ${CURRENT_KUBERNETES_VERSIONS_AS_STRING}
+    do
+        index=$((index + 1))
+        python_version=${repeated_python_versions[${index}]}
+        FORWARDED_PORT_NUMBER=$((38080 + index))
+        export FORWARDED_PORT_NUMBER
+        API_SERVER_PORT=$((19090 + index))
+        export API_SERVER_PORT
+        run_kubernetes_test "${kubernetes_version}" "${python_version}" "${@}"
+    done
+    set +e
+    parallel --semaphore --semaphorename "${SEMAPHORE_NAME}" --wait
+    parallel::kill_monitor
+    set -e
+    start_end::group_end
+}
+
+initialization::set_output_color_variables
+
+parallel::make_sure_gnu_parallel_is_installed
+parallel::make_sure_python_versions_are_specified
+parallel::make_sure_kubernetes_versions_are_specified
+
+get_maximum_parallel_k8s_jobs
+
+run_k8s_tests_in_parallel "${@}"
+
+# this will exit with error code in case some of the tests failed
+parallel::print_job_summary_and_return_status_code
diff --git a/scripts/ci/kubernetes/kind-cluster-conf.yaml b/scripts/ci/kubernetes/kind-cluster-conf.yaml
index f03c1b7..4e891f8 100644
--- a/scripts/ci/kubernetes/kind-cluster-conf.yaml
+++ b/scripts/ci/kubernetes/kind-cluster-conf.yaml
@@ -20,12 +20,12 @@ apiVersion: kind.x-k8s.io/v1alpha4
 networking:
   ipFamily: ipv4
   apiServerAddress: "127.0.0.1"
-  apiServerPort: 19090
+  apiServerPort: {{API_SERVER_PORT}}
 nodes:
   - role: control-plane
   - role: worker
     extraPortMappings:
       - containerPort: 30007
-        hostPort: 8080
+        hostPort: {{FORWARDED_PORT_NUMBER}}
         listenAddress: "127.0.0.1"
         protocol: TCP
diff --git a/scripts/ci/libraries/_build_images.sh b/scripts/ci/libraries/_build_images.sh
index 771a06d..edf9e29 100644
--- a/scripts/ci/libraries/_build_images.sh
+++ b/scripts/ci/libraries/_build_images.sh
@@ -462,7 +462,6 @@ function build_images::get_docker_image_names() {
 # Also enable experimental features of docker (we need `docker manifest` command)
 function build_image::configure_docker_registry() {
     if [[ ${USE_GITHUB_REGISTRY} == "true" ]]; then
-        start_end::group_start "Determine GitHub Registry token"
         local token=""
         if [[ "${GITHUB_REGISTRY}" == "ghcr.io" ]]; then
             # For now ghcr.io can only authenticate using Personal Access Token with package access scope.
@@ -482,8 +481,6 @@ function build_image::configure_docker_registry() {
             echo
             exit 1
         fi
-        start_end::group_end
-        start_end::group_start "Logging in to GitHub Registry"
         if [[ -z "${token}" ]] ; then
             verbosity::print_info
             verbosity::print_info "Skip logging in to GitHub Registry. No Token available!"
@@ -497,14 +494,9 @@ function build_image::configure_docker_registry() {
         else
             verbosity::print_info "Skip Login to GitHub Registry ${GITHUB_REGISTRY} as token is missing"
         fi
-        start_end::group_end
-
-        start_end::group_start "Make sure experimental docker features are enabled"
         local new_config
         new_config=$(jq '.experimental = "enabled"' "${HOME}/.docker/config.json")
         echo "${new_config}" > "${HOME}/.docker/config.json"
-        start_end::group_end
-
     fi
 }
 
@@ -861,9 +853,10 @@ function build_images::build_prod_images() {
     build_images::print_build_info
 
     if [[ ${SKIP_BUILDING_PROD_IMAGE} == "true" ]]; then
-        verbosity::print_info
-        verbosity::print_info "Skip building production image. Assume the one we have is good!"
-        verbosity::print_info
+        echo
+        echo "${COLOR_YELLOW}Skip building production image. Assume the one we have is good!${COLOR_RESET}"
+        echo "${COLOR_YELLOW}You must run './breeze build-image --production-image before for all python versions!${COLOR_RESET}"
+        echo
         return
     fi
 
@@ -978,12 +971,6 @@ function build_images::build_prod_images() {
     fi
 }
 
-function build_images::build_prod_images_with_group() {
-    start_end::group_start "Build PROD images ${AIRFLOW_PROD_BUILD_IMAGE}"
-    build_images::build_prod_images
-    start_end::group_end
-}
-
 # Waits for image tag to appear in GitHub Registry, pulls it and tags with the target tag
 # Parameters:
 #  $1 - image name to wait for
@@ -1084,7 +1071,7 @@ function build_images::build_prod_images_from_locally_built_airflow_packages() {
     build_airflow_packages::build_airflow_packages
     mv "${AIRFLOW_SOURCES}/dist/"* "${AIRFLOW_SOURCES}/docker-context-files/"
 
-    build_images::build_prod_images_with_group
+    build_images::build_prod_images
 }
 
 # Useful information for people who stumble upon a pip check failure
diff --git a/scripts/ci/libraries/_docker_engine_resources.sh b/scripts/ci/libraries/_docker_engine_resources.sh
index f5ed3e6..0433359 100644
--- a/scripts/ci/libraries/_docker_engine_resources.sh
+++ b/scripts/ci/libraries/_docker_engine_resources.sh
@@ -19,10 +19,16 @@
 
 function docker_engine_resources::print_overall_stats() {
     echo
-    echo "Overall resource statistics"
+    echo "Docker statistics"
     echo
     docker stats --all --no-stream --no-trunc
-    docker run --rm --entrypoint /bin/bash "debian:buster-slim" -c "cat /proc/meminfo"
+    echo
+    echo "Memory statistics"
+    echo
+    docker run --rm --entrypoint /bin/sh "alpine:latest" -c "free -m"
+    echo
+    echo "Disk statistics"
+    echo
     df -h || true
 }
 
diff --git a/scripts/ci/libraries/_initialization.sh b/scripts/ci/libraries/_initialization.sh
index 1098e4a..191baee 100644
--- a/scripts/ci/libraries/_initialization.sh
+++ b/scripts/ci/libraries/_initialization.sh
@@ -508,14 +508,18 @@ function initialization::initialize_kubernetes_variables() {
     # Kubectl version
     export KUBECTL_VERSION=${KUBERNETES_VERSION:=${DEFAULT_KUBERNETES_VERSION}}
     # Local Kind path
-    export KIND_BINARY_PATH="${BUILD_CACHE_DIR}/bin/kind"
+    export KIND_BINARY_PATH="${BUILD_CACHE_DIR}/kubernetes-bin/${KUBERNETES_VERSION}/kind"
     readonly KIND_BINARY_PATH
     # Local Helm path
-    export HELM_BINARY_PATH="${BUILD_CACHE_DIR}/bin/helm"
+    export HELM_BINARY_PATH="${BUILD_CACHE_DIR}/kubernetes-bin/${KUBERNETES_VERSION}/helm"
     readonly HELM_BINARY_PATH
     # local Kubectl path
-    export KUBECTL_BINARY_PATH="${BUILD_CACHE_DIR}/bin/kubectl"
+    export KUBECTL_BINARY_PATH="${BUILD_CACHE_DIR}/kubernetes-bin/${KUBERNETES_VERSION}/kubectl"
     readonly KUBECTL_BINARY_PATH
+    FORWARDED_PORT_NUMBER="${FORWARDED_PORT_NUMBER:="8080"}"
+    readonly FORWARDED_PORT_NUMBER
+    API_SERVER_PORT="${API_SERVER_PORT:="19090"}"
+    readonly API_SERVER_PORT
 }
 
 function initialization::initialize_git_variables() {
diff --git a/scripts/ci/libraries/_kind.sh b/scripts/ci/libraries/_kind.sh
index a8deaac..bb883ec 100644
--- a/scripts/ci/libraries/_kind.sh
+++ b/scripts/ci/libraries/_kind.sh
@@ -21,26 +21,24 @@ function kind::get_kind_cluster_name() {
     export KIND_CLUSTER_NAME=${KIND_CLUSTER_NAME:="airflow-python-${PYTHON_MAJOR_MINOR_VERSION}-${KUBERNETES_VERSION}"}
     # Name of the KinD cluster to connect to when referred to via kubectl
     export KUBECTL_CLUSTER_NAME=kind-${KIND_CLUSTER_NAME}
-    export KUBECONFIG="${BUILD_CACHE_DIR}/.kube/config"
-    mkdir -pv "${BUILD_CACHE_DIR}/.kube/"
+    export KUBECONFIG="${BUILD_CACHE_DIR}/${KIND_CLUSTER_NAME}/.kube/config"
+    mkdir -pv "${BUILD_CACHE_DIR}/${KIND_CLUSTER_NAME}/.kube/"
     touch "${KUBECONFIG}"
 }
 
 function kind::dump_kind_logs() {
-    start_end::group_start "Dumping logs from KinD"
+    verbosity::print_info "Dumping logs from KinD"
     local DUMP_DIR_NAME DUMP_DIR
     DUMP_DIR_NAME=kind_logs_$(date "+%Y-%m-%d")_${CI_BUILD_ID}_${CI_JOB_ID}
     DUMP_DIR="/tmp/${DUMP_DIR_NAME}"
     kind --name "${KIND_CLUSTER_NAME}" export logs "${DUMP_DIR}"
-    start_end::group_end
 }
 
 function kind::make_sure_kubernetes_tools_are_installed() {
-    start_end::group_start "Make sure Kubernetes tools are installed"
     SYSTEM=$(uname -s | tr '[:upper:]' '[:lower:]')
 
     KIND_URL="https://github.com/kubernetes-sigs/kind/releases/download/${KIND_VERSION}/kind-${SYSTEM}-amd64"
-    mkdir -pv "${BUILD_CACHE_DIR}/bin"
+    mkdir -pv "${BUILD_CACHE_DIR}/kubernetes-bin/${KUBERNETES_VERSION}"
     if [[ -f "${KIND_BINARY_PATH}" ]]; then
         DOWNLOADED_KIND_VERSION=v"$(${KIND_BINARY_PATH} --version | awk '{ print $3 }')"
         echo "Currently downloaded kind version = ${DOWNLOADED_KIND_VERSION}"
@@ -87,15 +85,17 @@ function kind::make_sure_kubernetes_tools_are_installed() {
         echo "Helm version ok"
         echo
     fi
-    PATH=${PATH}:${BUILD_CACHE_DIR}/bin
-    start_end::group_end
+    PATH=${PATH}:${BUILD_CACHE_DIR}/kubernetes-bin/${KUBERNETES_VERSION}
 }
 
 function kind::create_cluster() {
-    kind create cluster \
-        --name "${KIND_CLUSTER_NAME}" \
-        --config "${AIRFLOW_SOURCES}/scripts/ci/kubernetes/kind-cluster-conf.yaml" \
-        --image "kindest/node:${KUBERNETES_VERSION}"
+        sed "s/{{FORWARDED_PORT_NUMBER}}/${FORWARDED_PORT_NUMBER}/" < \
+            "${AIRFLOW_SOURCES}/scripts/ci/kubernetes/kind-cluster-conf.yaml" | \
+        sed "s/{{API_SERVER_PORT}}/${API_SERVER_PORT}/" | \
+        kind create cluster \
+            --name "${KIND_CLUSTER_NAME}" \
+            --config - \
+            --image "kindest/node:${KUBERNETES_VERSION}"
     echo
     echo "Created cluster ${KIND_CLUSTER_NAME}"
     echo
@@ -106,7 +106,7 @@ function kind::delete_cluster() {
     echo
     echo "Deleted cluster ${KIND_CLUSTER_NAME}"
     echo
-    rm -rf "${HOME}/.kube/*"
+    rm -rf "${BUILD_CACHE_DIR}/${KIND_CLUSTER_NAME}/.kube/"
 }
 
 function kind::set_current_context() {
@@ -122,7 +122,6 @@ function kind::perform_kind_cluster_operation() {
         echo
         exit 1
     fi
-    start_end::group_start "Perform KinD cluster operation: ${1}"
 
     set -u
     OPERATION="${1}"
@@ -229,7 +228,6 @@ function kind::perform_kind_cluster_operation() {
             exit 1
         fi
     fi
-    start_end::group_end
 }
 
 function kind::check_cluster_ready_for_airflow() {
@@ -250,7 +248,6 @@ function kind::check_cluster_ready_for_airflow() {
 }
 
 function kind::build_image_for_kubernetes_tests() {
-    start_end::group_start "Build image for kubernetes tests ${AIRFLOW_PROD_IMAGE_KUBERNETES}"
     cd "${AIRFLOW_SOURCES}" || exit 1
     docker_v build --tag "${AIRFLOW_PROD_IMAGE_KUBERNETES}" . -f - <<EOF
 FROM ${AIRFLOW_PROD_IMAGE}
@@ -261,13 +258,10 @@ COPY airflow/kubernetes_executor_templates/ \${AIRFLOW_HOME}/pod_templates/
 
 EOF
     echo "The ${AIRFLOW_PROD_IMAGE_KUBERNETES} is prepared for test kubernetes deployment."
-    start_end::group_end
 }
 
 function kind::load_image_to_kind_cluster() {
-    start_end::group_start "Loading ${AIRFLOW_PROD_IMAGE_KUBERNETES} to ${KIND_CLUSTER_NAME}"
     kind load docker-image --name "${KIND_CLUSTER_NAME}" "${AIRFLOW_PROD_IMAGE_KUBERNETES}"
-    start_end::group_end
 }
 
 MAX_NUM_TRIES_FOR_HEALTH_CHECK=12
@@ -276,12 +270,7 @@ readonly MAX_NUM_TRIES_FOR_HEALTH_CHECK
 SLEEP_TIME_FOR_HEALTH_CHECK=10
 readonly SLEEP_TIME_FOR_HEALTH_CHECK
 
-FORWARDED_PORT_NUMBER=8080
-readonly FORWARDED_PORT_NUMBER
-
-
 function kind::wait_for_webserver_healthy() {
-    start_end::group_start "Waiting for webserver being healthy"
     num_tries=0
     set +e
     sleep "${SLEEP_TIME_FOR_HEALTH_CHECK}"
@@ -300,14 +289,10 @@ function kind::wait_for_webserver_healthy() {
     echo
     echo "Connection to 'airflow webserver' established on port ${FORWARDED_PORT_NUMBER}"
     echo
-    initialization::ga_env CLUSTER_FORWARDED_PORT "${FORWARDED_PORT_NUMBER}"
-    export CLUSTER_FORWARDED_PORT="${FORWARDED_PORT_NUMBER}"
     set -e
-    start_end::group_end
 }
 
 function kind::deploy_airflow_with_helm() {
-    start_end::group_start "Deploying Airflow with Helm"
     echo "Deleting namespace ${HELM_AIRFLOW_NAMESPACE}"
     kubectl delete namespace "${HELM_AIRFLOW_NAMESPACE}" >/dev/null 2>&1 || true
     kubectl delete namespace "test-namespace" >/dev/null 2>&1 || true
@@ -331,7 +316,14 @@ function kind::deploy_airflow_with_helm() {
       kubectl -n "${HELM_AIRFLOW_NAMESPACE}" patch serviceaccount default -p '{"imagePullSecrets": [{"name": "regcred"}]}'
     fi
 
-    pushd "${AIRFLOW_SOURCES}/chart" >/dev/null 2>&1 || exit 1
+    local chartdir
+    chartdir=$(mktemp -d)
+    traps::add_trap "rm -rf ${chartdir}" EXIT INT HUP TERM
+    # Copy chart to temporary directory to allow chart deployment in parallel
+    # Otherwise helm deployment will fail on renaming charts to tmpcharts
+    cp -r "${AIRFLOW_SOURCES}/chart" "${chartdir}"
+
+    pushd "${chartdir}/chart" >/dev/null 2>&1 || exit 1
     helm repo add stable https://charts.helm.sh/stable/
     helm dep update
     helm install airflow . --namespace "${HELM_AIRFLOW_NAMESPACE}" \
@@ -343,15 +335,10 @@ function kind::deploy_airflow_with_helm() {
         --set "config.api.enable_experimental_api=true"
     echo
     popd > /dev/null 2>&1|| exit 1
-    start_end::group_end
 }
 
 function kind::deploy_test_kubernetes_resources() {
-    start_end::group_start "Deploying Airflow with Helm"
-    echo
-    echo "Deploying Custom kubernetes resources"
-    echo
+    verbosity::print_info "Deploying Custom kubernetes resources"
     kubectl apply -f "scripts/ci/kubernetes/volumes.yaml" --namespace default
     kubectl apply -f "scripts/ci/kubernetes/nodeport.yaml" --namespace airflow
-    start_end::group_end
 }
diff --git a/scripts/ci/libraries/_parallel.sh b/scripts/ci/libraries/_parallel.sh
index 7239e82..935f465 100644
--- a/scripts/ci/libraries/_parallel.sh
+++ b/scripts/ci/libraries/_parallel.sh
@@ -194,7 +194,6 @@ function parallel::cleanup_runner() {
     start_end::group_end
 }
 
-
 function parallel::make_sure_python_versions_are_specified() {
     if [[ -z "${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING=}" ]]; then
         echo
@@ -203,6 +202,18 @@ function parallel::make_sure_python_versions_are_specified() {
         exit 1
     fi
     echo
-    echo "${COLOR_BLUE}Running parallel builds for those Python versions: ${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}!${COLOR_RESET}"
+    echo "${COLOR_BLUE}Running parallel builds for those Python versions: ${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}${COLOR_RESET}"
+    echo
+}
+
+function parallel::make_sure_kubernetes_versions_are_specified() {
+    if [[ -z "${CURRENT_KUBERNETES_VERSIONS_AS_STRING=}" ]]; then
+        echo
+        echo "${COLOR_RED}The CURRENT_KUBERNETES_VERSIONS_AS_STRING variable must be set and list K8S versions to use!${COLOR_RESET}"
+        echo
+        exit 1
+    fi
+    echo
+    echo "${COLOR_BLUE}Running parallel builds for those Kubernetes versions: ${CURRENT_KUBERNETES_VERSIONS_AS_STRING}${COLOR_RESET}"
     echo
 }
diff --git a/scripts/ci/libraries/_testing.sh b/scripts/ci/libraries/_testing.sh
index 28d1fc6..638daf5 100644
--- a/scripts/ci/libraries/_testing.sh
+++ b/scripts/ci/libraries/_testing.sh
@@ -52,7 +52,7 @@ function testing::get_docker_compose_local() {
 
 function testing::get_maximum_parallel_test_jobs() {
     docker_engine_resources::get_available_cpus_in_docker
-    if [[ ${RUNS_ON} != *"self-hosted"* ]]; then
+    if [[ -n ${RUNS_ON=} && ${RUNS_ON} != *"self-hosted"* ]]; then
         echo
         echo "${COLOR_YELLOW}This is a Github Public runner - for now we are forcing max parallel Quarantined tests jobs to 1 for those${COLOR_RESET}"
         echo
diff --git a/scripts/ci/libraries/_verbosity.sh b/scripts/ci/libraries/_verbosity.sh
index dc3ca5a..68b356d 100644
--- a/scripts/ci/libraries/_verbosity.sh
+++ b/scripts/ci/libraries/_verbosity.sh
@@ -40,7 +40,7 @@ function verbosity::restore_exit_on_error_status() {
 # printed before execution. In case of DRY_RUN_DOCKER flag set to "true"
 # show the command to execute instead of executing them
 function docker_v {
-    if [[ ${DRY_RUN_DOCKER} != "false" ]]; then
+    if [[ ${DRY_RUN_DOCKER=} != "false" ]]; then
         echo
         echo "${COLOR_CYAN}docker" "${@}" "${COLOR_RESET}"
         echo
diff --git a/scripts/ci/selective_ci_checks.sh b/scripts/ci/selective_ci_checks.sh
index cf81b33..d639714 100755
--- a/scripts/ci/selective_ci_checks.sh
+++ b/scripts/ci/selective_ci_checks.sh
@@ -61,6 +61,7 @@ function output_all_basic_variables() {
         initialization::ga_output all-python-versions \
             "$(initialization::parameters_to_json "${ALL_PYTHON_MAJOR_MINOR_VERSIONS[@]}")"
         initialization::ga_output python-versions-list-as-string "${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS[*]}"
+        initialization::ga_output kubernetes-versions-list-as-string "${CURRENT_KUBERNETES_VERSIONS[*]}"
     else
         initialization::ga_output python-versions \
             "$(initialization::parameters_to_json "${DEFAULT_PYTHON_MAJOR_MINOR_VERSION}")"
@@ -69,6 +70,7 @@ function output_all_basic_variables() {
         initialization::ga_output all-python-versions \
             "$(initialization::parameters_to_json "${DEFAULT_PYTHON_MAJOR_MINOR_VERSION}")"
         initialization::ga_output python-versions-list-as-string "${DEFAULT_PYTHON_MAJOR_MINOR_VERSION}"
+        initialization::ga_output kubernetes-versions-list-as-string "${DEFAULT_KUBERNETES_VERSION}"
     fi
     initialization::ga_output default-python-version "${DEFAULT_PYTHON_MAJOR_MINOR_VERSION}"
 
diff --git a/scripts/ci/testing/ci_run_quarantined_tests.sh b/scripts/ci/testing/ci_run_quarantined_tests.sh
index 0c1108e..57e4aca 100755
--- a/scripts/ci/testing/ci_run_quarantined_tests.sh
+++ b/scripts/ci/testing/ci_run_quarantined_tests.sh
@@ -15,6 +15,7 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+set -euo pipefail
 
 # Enable automated tests execution
 RUN_TESTS="true"
@@ -29,6 +30,8 @@ export SEMAPHORE_NAME
 # shellcheck source=scripts/ci/libraries/_script_init.sh
 . "$( dirname "${BASH_SOURCE[0]}" )/../libraries/_script_init.sh"
 
+initialization::set_output_color_variables
+
 BACKEND_TEST_TYPES=(mysql postgres sqlite)
 
 # Starts test types in parallel
diff --git a/scripts/in_container/entrypoint_ci.sh b/scripts/in_container/entrypoint_ci.sh
index 5cb8d99..16aabbb 100755
--- a/scripts/in_container/entrypoint_ci.sh
+++ b/scripts/in_container/entrypoint_ci.sh
@@ -210,7 +210,7 @@ if [[ "${RUN_TESTS}" != "true" ]]; then
 fi
 set -u
 
-export RESULT_LOG_FILE="/files/test_result-${TEST_TYPE}.xml"
+export RESULT_LOG_FILE="/files/test_result-${TEST_TYPE}-${BACKEND}.xml"
 
 EXTRA_PYTEST_ARGS=(
     "--verbosity=0"
@@ -218,7 +218,7 @@ EXTRA_PYTEST_ARGS=(
     "--durations=100"
     "--cov=airflow/"
     "--cov-config=.coveragerc"
-    "--cov-report=xml:/files/coverage.xml"
+    "--cov-report=xml:/files/coverage-${TEST_TYPE}-${BACKEND}.xml"
     "--color=yes"
     "--maxfail=50"
     "--pythonwarnings=ignore::DeprecationWarning"

[airflow] 27/36: Bugfix: Task docs are not shown in the Task Instance Detail View (#15191)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit c8d1c670679995bab7cbfc0adb5c3a9969898f80
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Mon Apr 5 03:46:41 2021 +0100

    Bugfix: Task docs are not shown in the Task Instance Detail View (#15191)
    
    closes https://github.com/apache/airflow/issues/15178
    closes https://github.com/apache/airflow/issues/13761
    
    This feature was added in 2015 in https://github.com/apache/airflow/pull/74 and it was expected to set `doc_md` (or `doc_rst` and other `doc_*`) via `task.doc_md` instead of passing via arg. However, this did not work with DAG Serialization as we only allowed a selected args to be stored in Serialized version of DAG.
    
    (cherry picked from commit e86f5ca8fa5ff22c1e1f48addc012919034c672f)
---
 airflow/example_dags/tutorial.py              |  1 +
 airflow/models/baseoperator.py                | 26 ++++++++++++++++++++++++++
 airflow/serialization/schema.json             |  7 ++++++-
 airflow/www/utils.py                          |  2 +-
 airflow/www/views.py                          |  2 +-
 docs/apache-airflow/concepts.rst              |  6 +++---
 tests/serialization/test_dag_serialization.py |  9 +++++++++
 tests/www/test_utils.py                       |  4 ++--
 8 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/airflow/example_dags/tutorial.py b/airflow/example_dags/tutorial.py
index 518c801..09d6ca3 100644
--- a/airflow/example_dags/tutorial.py
+++ b/airflow/example_dags/tutorial.py
@@ -97,6 +97,7 @@ with DAG(
     You can document your task using the attributes `doc_md` (markdown),
     `doc` (plain text), `doc_rst`, `doc_json`, `doc_yaml` which gets
     rendered in the UI's Task Instance Details page.
+
     ![img](http://montcs.bloomu.edu/~bobmon/Semesters/2012-01/491/import%20soul.png)
     """
     )
diff --git a/airflow/models/baseoperator.py b/airflow/models/baseoperator.py
index 8bda785..eacea64 100644
--- a/airflow/models/baseoperator.py
+++ b/airflow/models/baseoperator.py
@@ -278,6 +278,21 @@ class BaseOperator(Operator, LoggingMixin, TaskMixin, metaclass=BaseOperatorMeta
     :param do_xcom_push: if True, an XCom is pushed containing the Operator's
         result
     :type do_xcom_push: bool
+    :param doc: Add documentation or notes to your Task objects that is visible in
+        Task Instance details View in the Webserver
+    :type doc: str
+    :param doc_md: Add documentation (in Markdown format) or notes to your Task objects
+        that is visible in Task Instance details View in the Webserver
+    :type doc_md: str
+    :param doc_rst: Add documentation (in RST format) or notes to your Task objects
+        that is visible in Task Instance details View in the Webserver
+    :type doc_rst: str
+    :param doc_json: Add documentation (in JSON format) or notes to your Task objects
+        that is visible in Task Instance details View in the Webserver
+    :type doc_json: str
+    :param doc_yaml: Add documentation (in YAML format) or notes to your Task objects
+        that is visible in Task Instance details View in the Webserver
+    :type doc_yaml: str
     """
 
     # For derived classes to define which fields will get jinjaified
@@ -381,6 +396,11 @@ class BaseOperator(Operator, LoggingMixin, TaskMixin, metaclass=BaseOperatorMeta
         inlets: Optional[Any] = None,
         outlets: Optional[Any] = None,
         task_group: Optional["TaskGroup"] = None,
+        doc: Optional[str] = None,
+        doc_md: Optional[str] = None,
+        doc_json: Optional[str] = None,
+        doc_yaml: Optional[str] = None,
+        doc_rst: Optional[str] = None,
         **kwargs,
     ):
         from airflow.models.dag import DagContext
@@ -486,6 +506,12 @@ class BaseOperator(Operator, LoggingMixin, TaskMixin, metaclass=BaseOperatorMeta
         self.executor_config = executor_config or {}
         self.do_xcom_push = do_xcom_push
 
+        self.doc_md = doc_md
+        self.doc_json = doc_json
+        self.doc_yaml = doc_yaml
+        self.doc_rst = doc_rst
+        self.doc = doc
+
         # Private attributes
         self._upstream_task_ids: Set[str] = set()
         self._downstream_task_ids: Set[str] = set()
diff --git a/airflow/serialization/schema.json b/airflow/serialization/schema.json
index 0fbe20f..3bc11ee 100644
--- a/airflow/serialization/schema.json
+++ b/airflow/serialization/schema.json
@@ -168,7 +168,12 @@
           "type": "array",
           "items": { "type": "string" },
           "uniqueItems": true
-        }
+        },
+        "doc":  { "type": "string" },
+        "doc_md":  { "type": "string" },
+        "doc_json":  { "type": "string" },
+        "doc_yaml":  { "type": "string" },
+        "doc_rst":  { "type": "string" }
       },
       "additionalProperties": true
     },
diff --git a/airflow/www/utils.py b/airflow/www/utils.py
index afd94c6..ad53436 100644
--- a/airflow/www/utils.py
+++ b/airflow/www/utils.py
@@ -321,7 +321,7 @@ def render(obj, lexer):
     return out
 
 
-def wrapped_markdown(s, css_class=None):
+def wrapped_markdown(s, css_class='rich_doc'):
     """Convert a Markdown string to HTML."""
     if s is None:
         return None
diff --git a/airflow/www/views.py b/airflow/www/views.py
index f0116b3..5f4c8c5 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -1220,7 +1220,7 @@ class Airflow(AirflowBaseView):  # noqa: D101  pylint: disable=too-many-public-m
         # Color coding the special attributes that are code
         special_attrs_rendered = {}
         for attr_name in wwwutils.get_attr_renderer():
-            if hasattr(task, attr_name):
+            if getattr(task, attr_name, None) is not None:
                 source = getattr(task, attr_name)
                 special_attrs_rendered[attr_name] = wwwutils.get_attr_renderer()[attr_name](source)
 
diff --git a/docs/apache-airflow/concepts.rst b/docs/apache-airflow/concepts.rst
index 2637b78..3de060b 100644
--- a/docs/apache-airflow/concepts.rst
+++ b/docs/apache-airflow/concepts.rst
@@ -1394,8 +1394,8 @@ Documentation & Notes
 =====================
 
 It's possible to add documentation or notes to your DAGs & task objects that
-become visible in the web interface ("Graph View" & "Tree View" for DAGs, "Task Details" for
-tasks). There are a set of special task attributes that get rendered as rich
+become visible in the web interface ("Graph View" & "Tree View" for DAGs, "Task Instance Details"
+for tasks). There are a set of special task attributes that get rendered as rich
 content if defined:
 
 ==========  ================
@@ -1430,7 +1430,7 @@ to the related tasks in Airflow.
     """
 
 This content will get rendered as markdown respectively in the "Graph View" and
-"Task Details" pages.
+"Task Instance Details" pages.
 
 .. _jinja-templating:
 
diff --git a/tests/serialization/test_dag_serialization.py b/tests/serialization/test_dag_serialization.py
index 55d2c5a..e447751 100644
--- a/tests/serialization/test_dag_serialization.py
+++ b/tests/serialization/test_dag_serialization.py
@@ -79,6 +79,7 @@ serialized_simple_dag_ground_truth = {
         },
         "is_paused_upon_creation": False,
         "_dag_id": "simple_dag",
+        "doc_md": "### DAG Tutorial Documentation",
         "fileloc": None,
         "tasks": [
             {
@@ -110,6 +111,7 @@ serialized_simple_dag_ground_truth = {
                         }
                     },
                 },
+                "doc_md": "### Task Tutorial Documentation",
             },
             {
                 "task_id": "custom_task",
@@ -170,6 +172,7 @@ def make_simple_dag():
         start_date=datetime(2019, 8, 1),
         is_paused_upon_creation=False,
         access_control={"test_role": {permissions.ACTION_CAN_READ, permissions.ACTION_CAN_EDIT}},
+        doc_md="### DAG Tutorial Documentation",
     ) as dag:
         CustomOperator(task_id='custom_task')
         BashOperator(
@@ -177,6 +180,7 @@ def make_simple_dag():
             bash_command='echo {{ task.task_id }}',
             owner='airflow',
             executor_config={"pod_override": executor_config_pod},
+            doc_md="### Task Tutorial Documentation",
         )
         return {'simple_dag': dag}
 
@@ -853,6 +857,11 @@ class TestStringifiedDAGs(unittest.TestCase):
             '_upstream_task_ids': set(),
             'depends_on_past': False,
             'do_xcom_push': True,
+            'doc': None,
+            'doc_json': None,
+            'doc_md': None,
+            'doc_rst': None,
+            'doc_yaml': None,
             'email': None,
             'email_on_failure': True,
             'email_on_retry': True,
diff --git a/tests/www/test_utils.py b/tests/www/test_utils.py
index 5ced73a..f4e50d9 100644
--- a/tests/www/test_utils.py
+++ b/tests/www/test_utils.py
@@ -240,7 +240,7 @@ class TestWrappedMarkdown(unittest.TestCase):
         )
 
         assert (
-            '<div class="None" ><table>\n<thead>\n<tr>\n<th>Job</th>\n'
+            '<div class="rich_doc" ><table>\n<thead>\n<tr>\n<th>Job</th>\n'
             '<th>Duration</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>ETL'
             '</td>\n<td>14m</td>\n</tr>\n</tbody>\n'
             '</table></div>'
@@ -255,4 +255,4 @@ class TestWrappedMarkdown(unittest.TestCase):
             """
         )
 
-        assert '<div class="None" ><h1>header</h1>\n<p>1st line\n2nd line</p></div>' == rendered
+        assert '<div class="rich_doc" ><h1>header</h1>\n<p>1st line\n2nd line</p></div>' == rendered

[airflow] 31/36: Fix url generation for TriggerDagRunOperatorLink (#14990)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 68f5b400631d761a4170b678d2f39caed877ec48
Author: Alan Ma <al...@gmail.com>
AuthorDate: Sun Apr 11 04:51:59 2021 -0700

    Fix url generation for TriggerDagRunOperatorLink (#14990)
    
    Fixes: #14675
    
    Instead of building the relative url manually, we can leverage flask's url generation to account for differing airflow base URL and HTML base URL.
    
    (cherry picked from commit aaa3bf6b44238241bd61178426b692df53770c22)
---
 airflow/utils/helpers.py    |  4 +++-
 tests/utils/test_helpers.py | 13 ++++++++++---
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/airflow/utils/helpers.py b/airflow/utils/helpers.py
index 69ac5a0..7fce177 100644
--- a/airflow/utils/helpers.py
+++ b/airflow/utils/helpers.py
@@ -24,6 +24,7 @@ from itertools import filterfalse, tee
 from typing import Any, Callable, Dict, Generator, Iterable, List, Optional, TypeVar
 from urllib import parse
 
+from flask import url_for
 from jinja2 import Template
 
 from airflow.configuration import conf
@@ -213,4 +214,5 @@ def build_airflow_url_with_query(query: Dict[str, Any]) -> str:
     'http://0.0.0.0:8000/base/graph?dag_id=my-task&root=&execution_date=2020-10-27T10%3A59%3A25.615587
     """
     view = conf.get('webserver', 'dag_default_view').lower()
-    return f"/{view}?{parse.urlencode(query)}"
+    url = url_for(f"Airflow.{view}")
+    return f"{url}?{parse.urlencode(query)}"
diff --git a/tests/utils/test_helpers.py b/tests/utils/test_helpers.py
index fffa2d4..bb7b453 100644
--- a/tests/utils/test_helpers.py
+++ b/tests/utils/test_helpers.py
@@ -142,10 +142,17 @@ class TestHelpers(unittest.TestCase):
 
     @conf_vars(
         {
-            ("webserver", "dag_default_view"): "custom",
+            ("webserver", "dag_default_view"): "graph",
         }
     )
     def test_build_airflow_url_with_query(self):
+        """
+        Test query generated with dag_id and params
+        """
         query = {"dag_id": "test_dag", "param": "key/to.encode"}
-        url = build_airflow_url_with_query(query)
-        assert url == "/custom?dag_id=test_dag&param=key%2Fto.encode"
+        expected_url = "/graph?dag_id=test_dag&param=key%2Fto.encode"
+
+        from airflow.www.app import cached_app
+
+        with cached_app(testing=True).test_request_context():
+            assert build_airflow_url_with_query(query) == expected_url

[airflow] 06/36: Updates 3.6 limits for latest versions of a few libraries (#15209)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 7ff9b8c4bbfe0376fcb9a8ecd3c70bbedb511b4e
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Mon Apr 5 20:25:11 2021 +0200

    Updates 3.6 limits for latest versions of a few libraries (#15209)
    
    This PR sets Pythong 3.6 specific limits for some of the packages
    that recently dropped support for Python 3.6 binary packages
    released via PyPI. Even if those packages did not drop the
    Python 3.6 support entirely, it gets more and more difficult to
    get those packages installed (both locally and in the Docker image)
    because the require the packages to be compiled and they often
    require a number of external dependencies to do so.
    
    This makes it difficult to automatically upgrade dependencies,
    because such upgrade fails for Python 3.6 images if we attempt
    to do so.
    
    This PR limits several of those dependencies (dask/pandas/numpy)
    to not use the lates major releases for those packages but limits
    them to the latest released versions.
    
    Also comment/clarification was added to recently (#15114) added limit
    for `pandas-gbq`. This limit has been added because of broken
    import for bigquery provider, but the comment about it was missing
    so the comment is added now.
    
    (cherry picked from commit e49722859b81cfcdd7e4bb8e8aba4efb049a8590)
---
 airflow/models/baseoperator.py                       |  2 +-
 airflow/providers/google/cloud/operators/dataflow.py |  6 +++---
 setup.cfg                                            |  7 ++++++-
 setup.py                                             | 11 +++++++++--
 4 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/airflow/models/baseoperator.py b/airflow/models/baseoperator.py
index 06094a1..8bda785 100644
--- a/airflow/models/baseoperator.py
+++ b/airflow/models/baseoperator.py
@@ -1493,7 +1493,7 @@ def cross_downstream(
 class BaseOperatorLink(metaclass=ABCMeta):
     """Abstract base class that defines how we get an operator link."""
 
-    operators: ClassVar[List[Type[BaseOperator]]] = []
+    operators: ClassVar[List[Type[BaseOperator]]] = []  # pylint: disable=invalid-name
     """
     This property will be used by Airflow Plugins to find the Operators to which you want
     to assign this Operator Link
diff --git a/airflow/providers/google/cloud/operators/dataflow.py b/airflow/providers/google/cloud/operators/dataflow.py
index 92ae77e..513fea3 100644
--- a/airflow/providers/google/cloud/operators/dataflow.py
+++ b/airflow/providers/google/cloud/operators/dataflow.py
@@ -43,9 +43,9 @@ class CheckJobRunning(Enum):
     WaitForRun - wait for job to finish and then continue with new job
     """
 
-    IgnoreJob = 1
-    FinishIfRunning = 2
-    WaitForRun = 3
+    IgnoreJob = 1  # pylint: disable=invalid-name
+    FinishIfRunning = 2  # pylint: disable=invalid-name
+    WaitForRun = 3  # pylint: disable=invalid-name
 
 
 class DataflowConfiguration:
diff --git a/setup.cfg b/setup.cfg
index fbb2276..ac103ba 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -110,7 +110,12 @@ install_requires =
     markdown>=2.5.2, <4.0
     markupsafe>=1.1.1, <2.0
     marshmallow-oneofschema>=2.0.1
-    pandas>=0.17.1, <2.0
+    # Numpy stopped releasing 3.6 binaries for 1.20.* series.
+    numpy<1.20;python_version<"3.7"
+    numpy;python_version>="3.7"
+    # Pandas stopped releasing 3.6 binaries for 1.2.* series.
+    pandas>=0.17.1, <1.2;python_version<"3.7"
+    pandas>=0.17.1, <2.0;python_version>="3.7"
     pendulum~=2.0
     pep562~=1.0;python_version<"3.7"
     psutil>=4.2.0, <6.0.0
diff --git a/setup.py b/setup.py
index 0f421bc..51bd9be 100644
--- a/setup.py
+++ b/setup.py
@@ -237,7 +237,12 @@ cgroups = [
 cloudant = [
     'cloudant>=2.0',
 ]
-dask = ['cloudpickle>=1.4.1, <1.5.0', 'distributed>=2.11.1, <2.20']
+dask = [
+    'cloudpickle>=1.4.1, <1.5.0',
+    'dask<2021.3.1;python_version>"3.7"',  # dask stopped supporting python 3.6 in 2021.3.1 version
+    'dask>=2.9.0;python_version>="3.7"',
+    'distributed>=2.11.1, <2.20',
+]
 databricks = [
     'requests>=2.20.0, <3',
 ]
@@ -313,7 +318,9 @@ google = [
     'google-cloud-workflows>=0.1.0,<2.0.0',
     'grpcio-gcp>=0.2.2',
     'json-merge-patch~=0.2',
-    'pandas-gbq',
+    # pandas-gbq 0.15.0 release broke google provider's bigquery import
+    # _check_google_client_version (airflow/providers/google/cloud/hooks/bigquery.py:49)
+    'pandas-gbq<0.15.0',
     'plyvel',
 ]
 grpc = [

[airflow] 09/36: Fix celery executor bug trying to call len on map (#14883)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 22b2a800ba81e2a90ef40b7a92eb80d4eb67acb2
Author: Ryan Hatter <25...@users.noreply.github.com>
AuthorDate: Tue Apr 6 05:21:38 2021 -0400

    Fix celery executor bug trying to call len on map (#14883)
    
    Co-authored-by: RNHTTR <ry...@wiftapp.com>
    (cherry picked from commit 4ee442970873ba59ee1d1de3ac78ef8e33666e0f)
---
 airflow/executors/celery_executor.py    | 22 ++++++++++-----------
 tests/executors/test_celery_executor.py | 35 +++++++++++++++++++++++----------
 2 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/airflow/executors/celery_executor.py b/airflow/executors/celery_executor.py
index a670294..2d0e915 100644
--- a/airflow/executors/celery_executor.py
+++ b/airflow/executors/celery_executor.py
@@ -476,7 +476,7 @@ class CeleryExecutor(BaseExecutor):
             return tis
 
         states_by_celery_task_id = self.bulk_state_fetcher.get_many(
-            map(operator.itemgetter(0), celery_tasks.values())
+            list(map(operator.itemgetter(0), celery_tasks.values()))
         )
 
         adopted = []
@@ -526,10 +526,6 @@ def fetch_celery_task_state(async_result: AsyncResult) -> Tuple[str, Union[str,
         return async_result.task_id, ExceptionWithTraceback(e, exception_traceback), None
 
 
-def _tasks_list_to_task_ids(async_tasks) -> Set[str]:
-    return {a.task_id for a in async_tasks}
-
-
 class BulkStateFetcher(LoggingMixin):
     """
     Gets status for many Celery tasks using the best method available
@@ -543,20 +539,22 @@ class BulkStateFetcher(LoggingMixin):
         super().__init__()
         self._sync_parallelism = sync_parralelism
 
+    def _tasks_list_to_task_ids(self, async_tasks) -> Set[str]:
+        return {a.task_id for a in async_tasks}
+
     def get_many(self, async_results) -> Mapping[str, EventBufferValueType]:
         """Gets status for many Celery tasks using the best method available."""
         if isinstance(app.backend, BaseKeyValueStoreBackend):
             result = self._get_many_from_kv_backend(async_results)
-            return result
-        if isinstance(app.backend, DatabaseBackend):
+        elif isinstance(app.backend, DatabaseBackend):
             result = self._get_many_from_db_backend(async_results)
-            return result
-        result = self._get_many_using_multiprocessing(async_results)
-        self.log.debug("Fetched %d states for %d task", len(result), len(async_results))
+        else:
+            result = self._get_many_using_multiprocessing(async_results)
+        self.log.debug("Fetched %d state(s) for %d task(s)", len(result), len(async_results))
         return result
 
     def _get_many_from_kv_backend(self, async_tasks) -> Mapping[str, EventBufferValueType]:
-        task_ids = _tasks_list_to_task_ids(async_tasks)
+        task_ids = self._tasks_list_to_task_ids(async_tasks)
         keys = [app.backend.get_key_for_task(k) for k in task_ids]
         values = app.backend.mget(keys)
         task_results = [app.backend.decode_result(v) for v in values if v]
@@ -565,7 +563,7 @@ class BulkStateFetcher(LoggingMixin):
         return self._prepare_state_and_info_by_task_dict(task_ids, task_results_by_task_id)
 
     def _get_many_from_db_backend(self, async_tasks) -> Mapping[str, EventBufferValueType]:
-        task_ids = _tasks_list_to_task_ids(async_tasks)
+        task_ids = self._tasks_list_to_task_ids(async_tasks)
         session = app.backend.ResultSession()
         task_cls = getattr(app.backend, "task_cls", TaskDb)
         with session_cleanup(session):
diff --git a/tests/executors/test_celery_executor.py b/tests/executors/test_celery_executor.py
index 944fa49..4f93007 100644
--- a/tests/executors/test_celery_executor.py
+++ b/tests/executors/test_celery_executor.py
@@ -414,7 +414,9 @@ class TestBulkStateFetcher(unittest.TestCase):
     def test_should_support_kv_backend(self, mock_mget):
         with _prepare_app():
             mock_backend = BaseKeyValueStoreBackend(app=celery_executor.app)
-            with mock.patch.object(celery_executor.app, 'backend', mock_backend):
+            with mock.patch.object(celery_executor.app, 'backend', mock_backend), self.assertLogs(
+                "airflow.executors.celery_executor.BulkStateFetcher", level="DEBUG"
+            ) as cm:
                 fetcher = BulkStateFetcher()
                 result = fetcher.get_many(
                     [
@@ -429,6 +431,9 @@ class TestBulkStateFetcher(unittest.TestCase):
         mock_mget.assert_called_once_with(mock.ANY)
 
         assert result == {'123': ('SUCCESS', None), '456': ("PENDING", None)}
+        assert [
+            'DEBUG:airflow.executors.celery_executor.BulkStateFetcher:Fetched 2 state(s) for 2 task(s)'
+        ] == cm.output
 
     @mock.patch("celery.backends.database.DatabaseBackend.ResultSession")
     @pytest.mark.integration("redis")
@@ -438,21 +443,26 @@ class TestBulkStateFetcher(unittest.TestCase):
         with _prepare_app():
             mock_backend = DatabaseBackend(app=celery_executor.app, url="sqlite3://")
 
-            with mock.patch.object(celery_executor.app, 'backend', mock_backend):
+            with mock.patch.object(celery_executor.app, 'backend', mock_backend), self.assertLogs(
+                "airflow.executors.celery_executor.BulkStateFetcher", level="DEBUG"
+            ) as cm:
                 mock_session = mock_backend.ResultSession.return_value  # pylint: disable=no-member
                 mock_session.query.return_value.filter.return_value.all.return_value = [
                     mock.MagicMock(**{"to_dict.return_value": {"status": "SUCCESS", "task_id": "123"}})
                 ]
 
-        fetcher = BulkStateFetcher()
-        result = fetcher.get_many(
-            [
-                mock.MagicMock(task_id="123"),
-                mock.MagicMock(task_id="456"),
-            ]
-        )
+                fetcher = BulkStateFetcher()
+                result = fetcher.get_many(
+                    [
+                        mock.MagicMock(task_id="123"),
+                        mock.MagicMock(task_id="456"),
+                    ]
+                )
 
         assert result == {'123': ('SUCCESS', None), '456': ("PENDING", None)}
+        assert [
+            'DEBUG:airflow.executors.celery_executor.BulkStateFetcher:Fetched 2 state(s) for 2 task(s)'
+        ] == cm.output
 
     @pytest.mark.integration("redis")
     @pytest.mark.integration("rabbitmq")
@@ -461,7 +471,9 @@ class TestBulkStateFetcher(unittest.TestCase):
         with _prepare_app():
             mock_backend = mock.MagicMock(autospec=BaseBackend)
 
-            with mock.patch.object(celery_executor.app, 'backend', mock_backend):
+            with mock.patch.object(celery_executor.app, 'backend', mock_backend), self.assertLogs(
+                "airflow.executors.celery_executor.BulkStateFetcher", level="DEBUG"
+            ) as cm:
                 fetcher = BulkStateFetcher(1)
                 result = fetcher.get_many(
                     [
@@ -471,3 +483,6 @@ class TestBulkStateFetcher(unittest.TestCase):
                 )
 
         assert result == {'123': ('SUCCESS', None), '456': ("PENDING", None)}
+        assert [
+            'DEBUG:airflow.executors.celery_executor.BulkStateFetcher:Fetched 2 state(s) for 2 task(s)'
+        ] == cm.output

[airflow] 30/36: Add documentation create/update community providers (#15061)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 096435eca8bc7a1b075414507ca56fed37fb7d28
Author: Marcos Marx <ma...@users.noreply.github.com>
AuthorDate: Sat Apr 3 10:01:04 2021 -0300

    Add documentation create/update community providers (#15061)
    
    (cherry picked from commit 932f8c2e9360de6371031d4d71df00867a2776e6)
---
 .../howto/create-update-providers.rst              | 301 +++++++++++++++++++++
 docs/apache-airflow-providers/index.rst            |  14 +-
 2 files changed, 314 insertions(+), 1 deletion(-)

diff --git a/docs/apache-airflow-providers/howto/create-update-providers.rst b/docs/apache-airflow-providers/howto/create-update-providers.rst
new file mode 100644
index 0000000..47ebb77
--- /dev/null
+++ b/docs/apache-airflow-providers/howto/create-update-providers.rst
@@ -0,0 +1,301 @@
+
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Community Providers
+===================
+
+.. contents:: :local:
+
+How-to creating a new community provider
+----------------------------------------
+
+This document gathers the necessary steps to create a new community provider and also guidelines for updating
+the existing ones. You should be aware that providers may have distinctions that may not be covered in
+this guide. The sequence described was designed to meet the most linear flow possible in order to develop a
+new provider.
+
+Another recommendation that will help you is to look for a provider that works similar to yours. That way it will
+help you to set up tests and other dependencies.
+
+First, you need to set up your local development environment. See `Contribution Quick Start <https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst>`_
+if you did not set up your local environment yet. We recommend using ``breeze`` to develop locally. This way you
+easily be able to have an environment more similar to the one executed by Github CI workflow.
+
+  .. code-block:: bash
+
+      ./breeze
+
+Using the code above you will set up Docker containers. These containers your local code to internal volumes.
+In this way, the changes made in your IDE are already applied to the code inside the container and tests can
+be carried out quickly.
+
+In this how-to guide our example provider name will be ``<NEW_PROVIDER>``.
+When you see this placeholder you must change for your provider name.
+
+
+Initial Code and Unit Tests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Most likely you have developed a version of the provider using some local customization and now you need to
+transfer this code to the Airflow project. Below is described all the initial code structure that
+the provider may need. Understand that not all providers will need all the components described in this structure.
+If you still have doubts about building your provider, we recommend that you read the initial provider guide and
+open a issue on Github so the community can help you.
+
+  .. code-block:: bash
+
+      airflow/
+      ├── providers/<NEW_PROVIDER>/
+      │   ├── __init__.py
+      │   ├── example_dags/
+      │   │   ├── __init__.py
+      │   │   └── example_<NEW_PROVIDER>.py
+      │   ├── hooks/
+      │   │   ├── __init__.py
+      │   │   └── <NEW_PROVIDER>.py
+      │   ├── operators/
+      │   │   ├── __init__.py
+      │   │   └── <NEW_PROVIDER>.py
+      │   ├── sensors/
+      │   │   ├── __init__.py
+      │   │   └── <NEW_PROVIDER>.py
+      │   └── transfers/
+      │       ├── __init__.py
+      │       └── <NEW_PROVIDER>.py
+      └── tests/providers/<NEW_PROVIDER>/
+          ├── __init__.py
+          ├── hooks/
+          │   ├── __init__.py
+          │   └── test_<NEW_PROVIDER>.py
+          ├── operators/
+          │   ├── __init__.py
+          │   ├── test_<NEW_PROVIDER>.py
+          │   └── test_<NEW_PROVIDER>_system.py
+          ├── sensors/
+          │   ├── __init__.py
+          │   └── test_<NEW_PROVIDER>.py
+          └── transfers/
+              ├── __init__.py
+              └── test_<NEW_PROVIDER>.py
+
+Considering that you have already transferred your provider's code to the above structure, it will now be necessary
+to create unit tests for each component you created. The example below I have already set up an environment using
+breeze and I'll run unit tests for my Hook.
+
+  .. code-block:: bash
+
+      root@fafd8d630e46:/opt/airflow# python -m pytest tests/providers/<NEW_PROVIDER>/hook/<NEW_PROVIDER>.py
+
+Update Airflow validation tests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+There are some tests that Airflow performs to ensure consistency that is related to the providers.
+
+  .. code-block:: bash
+
+      airflow/scripts/in_container/
+      └── run_install_and_test_provider_packages.sh
+      tests/core/
+      └── test_providers_manager.py
+
+Change expected number of providers, hooks and connections if needed in ``run_install_and_test_provider_packages.sh`` file.
+
+Add your provider information in the following variables in ``test_providers_manager.py``:
+
+- add your provider to ``ALL_PROVIDERS`` list;
+- add your provider into ``CONNECTIONS_LIST`` if your provider create a new connection type.
+
+
+Integration tests
+^^^^^^^^^^^^^^^^^
+
+See `Airflow Integration Tests <https://github.com/apache/airflow/blob/master/TESTING.rst#airflow-integration-tests>`_
+
+
+Documentation
+^^^^^^^^^^^^^
+
+An important part of building a new provider is the documentation.
+Some steps for documentation occurs automatically by ``pre-commit`` see `Installing pre-commit guide <https://github.com/apache/airflow/blob/master/CONTRIBUTORS_QUICK_START.rst#pre-commit>`_
+
+  .. code-block:: bash
+
+      airflow/
+      ├── INSTALL
+      ├── CONTRIBUTING.rst
+      ├── setup.py
+      ├── docs/
+      │   ├── spelling_wordlist.txt
+      │   ├── apache-airflow/
+      │   │   └── extra-packages-ref.rst
+      │   ├── integration-logos/<NEW_PROVIDER>/
+      │   │   └── <NEW_PROVIDER>.png
+      │   └── apache-airflow-providers-<NEW_PROVIDER>/
+      │       ├── index.rst
+      │       ├── commits.rst
+      │       ├── connections.rst
+      │       └── operators/
+      │           └── <NEW_PROVIDER>.rst
+      └── providers/
+          ├── dependencies.json
+          └── <NEW_PROVIDER>/
+              ├── provider.yaml
+              └── CHANGELOG.rst
+
+
+Files automatically updated by pre-commit:
+
+- ``airflow/providers/dependencies.json``
+- ``INSTALL``
+
+Files automatically created when the provider is released:
+
+- ``docs/apache-airflow-providers-<NEW_PROVIDER>/commits.rst``
+- ``/airflow/providers/<NEW_PROVIDER>/CHANGELOG``
+
+There is a chance that your provider's name is not a common English word.
+In this case is necessary to add it to the file ``docs/spelling_wordlist.txt``. This file begin with capitalized words and
+lowercase in the second block.
+
+  .. code-block:: bash
+
+    Namespace
+    Neo4j
+    Nextdoor
+    <NEW_PROVIDER> (new line)
+    Nones
+    NotFound
+    Nullable
+    ...
+    neo4j
+    neq
+    networkUri
+    <NEW_PROVIDER> (new line)
+    nginx
+    nobr
+    nodash
+
+Add your provider dependencies into **PROVIDER_REQUIREMENTS** variable in ``setup.py``. If your provider doesn't have
+any dependency add a empty list.
+
+  .. code-block:: python
+
+      PROVIDERS_REQUIREMENTS: Dict[str, List[str]] = {
+          ...
+          'microsoft.winrm': winrm,
+          'mongo': mongo,
+          'mysql': mysql,
+          'neo4j': neo4j,
+          '<NEW_PROVIDER>': [],
+          'odbc': odbc,
+          ...
+          }
+
+In the ``CONTRIBUTING.rst`` adds:
+
+- your provider name in the list in the **Extras** section
+- your provider dependencies in the **Provider Packages** section table, only if your provider has external dependencies.
+
+In the ``docs/apache-airflow-providers-<NEW_PROVIDER>/connections.rst``:
+
+- add information how to configure connection for your provider.
+
+In the ``docs/apache-airflow-providers-<NEW_PROVIDER>/operators/<NEW_PROVIDER>.rst``:
+
+- add information how to use the Operator. It's important to add examples and additional information if your Operator has extra-parameters.
+
+  .. code-block:: RST
+
+      .. _howto/operator:NewProviderOperator:
+
+      NewProviderOperator
+      ===================
+
+      Use the :class:`~airflow.providers.<NEW_PROVIDER>.operators.NewProviderOperator` to do something
+      amazing with Airflow!
+
+      Using the Operator
+      ^^^^^^^^^^^^^^^^^^
+
+      The NewProviderOperator requires a ``connection_id`` and this other awesome parameter.
+      You can see an example below:
+
+      .. exampleinclude:: /../../airflow/providers/<NEW_PROVIDER>/example_dags/example_<NEW_PROVIDER>.py
+          :language: python
+          :start-after: [START howto_operator_<NEW_PROVIDER>]
+          :end-before: [END howto_operator_<NEW_PROVIDER>]
+
+
+In the ``docs/apache-airflow-providers-new_provider/index.rst``:
+
+- add all information of the purpose of your provider. It is recommended to check with another provider to help you complete this document as best as possible.
+
+In the ``airflow/providers/<NEW_PROVIDER>/provider.yaml`` add information of your provider:
+
+  .. code-block:: yaml
+
+      package-name: apache-airflow-providers-<NEW_PROVIDER>
+      name: <NEW_PROVIDER>
+      description: |
+        `<NEW_PROVIDER> <https://example.io/>`__
+      versions:
+        - 1.0.0
+
+      integrations:
+        - integration-name: <NEW_PROVIDER>
+          external-doc-url: https://www.example.io/
+          logo: /integration-logos/<NEW_PROVIDER>/<NEW_PROVIDER>.png
+          how-to-guide:
+            - /docs/apache-airflow-providers-<NEW_PROVIDER>/operators/<NEW_PROVIDER>.rst
+          tags: [service]
+
+      operators:
+        - integration-name: <NEW_PROVIDER>
+          python-modules:
+            - airflow.providers.<NEW_PROVIDER>.operators.<NEW_PROVIDER>
+
+      hooks:
+        - integration-name: <NEW_PROVIDER>
+          python-modules:
+            - airflow.providers.<NEW_PROVIDER>.hooks.<NEW_PROVIDER>
+
+      sensors:
+        - integration-name: <NEW_PROVIDER>
+          python-modules:
+            - airflow.providers.<NEW_PROVIDER>.sensors.<NEW_PROVIDER>
+
+      hook-class-names:
+        - airflow.providers.<NEW_PROVIDER>.hooks.<NEW_PROVIDER>.NewProviderHook
+
+You only need to add ``hook-class-names`` in case you have some hooks that have customized UI behavior.
+For more information see `Custom connection types <http://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#custom-connection-types>`_
+
+
+After changing and creating these files you can build the documentation locally. The two commands below will
+serve to accomplish this. The first will build your provider's documentation. The second will ensure that the
+main Airflow documentation that involves some steps with the providers is also working.
+
+  .. code-block:: bash
+
+    ./breeze build-docs -- --package-filter apache-airflow-providers-<NEW_PROVIDER>
+    ./breeze build-docs -- --package-filter apache-airflow
+
+How-to Update a community provider
+----------------------------------
+
+See `Provider packages versioning <https://github.com/apache/airflow/blob/master/dev/README_RELEASE_PROVIDER_PACKAGES.md#provider-packages-versioning>`_
diff --git a/docs/apache-airflow-providers/index.rst b/docs/apache-airflow-providers/index.rst
index 81e02fa..43c1687 100644
--- a/docs/apache-airflow-providers/index.rst
+++ b/docs/apache-airflow-providers/index.rst
@@ -54,6 +54,11 @@ provider packages are automatically documented in the release notes of every pro
     Those are the same providers as for 2.0 but automatically back-ported to work for Airflow 1.10. The
     last release of backport providers was done on March 17, 2021.
 
+Creating and maintaining community providers
+""""""""""""""""""""""""""""""""""""""""""""
+
+See :doc:`howto/create-update-providers` for more information.
+
 
 Provider packages functionality
 '''''''''''''''''''''''''''''''
@@ -242,7 +247,7 @@ Example ``myproviderpackage/somemodule.py``:
 
 **How do provider packages work under the hood?**
 
-When running airflow with your provider package, there will be (at least) three components to your airflow installation:
+When running Airflow with your provider package, there will be (at least) three components to your airflow installation:
 
 * The installation itself (for example, a ``venv`` where you installed airflow with ``pip install apache-airflow``)
   together with the related files (e.g. ``dags`` folder)
@@ -296,6 +301,12 @@ The Community only accepts providers that are generic enough, are well documente
 and with capabilities of being tested by people in the community. So we might not always be in the
 position to accept such contributions.
 
+
+After you think that your provider matches the expected values above,  you can read
+:doc:`howto/create-update-providers` to check all prerequisites for a new
+community Provider and discuss it at the `Devlist <http://airflow.apache.org/community/>`_.
+
+
 However, in case you have your own, specific provider, which you can maintain on your own or by your
 team, you are free to publish the providers in whatever form you find appropriate. The custom and
 community-managed providers have exactly the same capabilities.
@@ -323,3 +334,4 @@ Content
 
     Packages <packages-ref>
     Operators and hooks <operators-and-hooks-ref/index>
+    Howto create and update community providers <howto/create-update-providers>

[airflow] 24/36: Bugfix: Fix overriding `pod_template_file` in KubernetesExecutor (#15197)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 18a153c04e67c4e041bb077a280c85344fb62dc4
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Mon Apr 5 16:56:00 2021 +0100

    Bugfix: Fix overriding `pod_template_file` in KubernetesExecutor (#15197)
    
    This feature was added in https://github.com/apache/airflow/pull/11784 but
    it was broken as it got `pod_template_override` from `executor_config`
    instead of `pod_template_file`.
    
    closes #14199
    
    (cherry picked from commit 5606137ba32c0daa87d557301d82f7f2bdc0b0a4)
---
 .../example_kubernetes_executor_config.py          |  3 +-
 airflow/executors/kubernetes_executor.py           |  2 +-
 .../basic_template.yaml                            |  4 +-
 docs/apache-airflow/executor/kubernetes.rst        |  2 +-
 .../basic_template.yaml                            | 34 ++++++++
 tests/executors/test_kubernetes_executor.py        | 91 +++++++++++++++++++++-
 6 files changed, 130 insertions(+), 6 deletions(-)

diff --git a/airflow/example_dags/example_kubernetes_executor_config.py b/airflow/example_dags/example_kubernetes_executor_config.py
index cbd69cb..5290dd8 100644
--- a/airflow/example_dags/example_kubernetes_executor_config.py
+++ b/airflow/example_dags/example_kubernetes_executor_config.py
@@ -24,6 +24,7 @@ import os
 from airflow import DAG
 from airflow.example_dags.libs.helper import print_stuff
 from airflow.operators.python import PythonOperator
+from airflow.settings import AIRFLOW_HOME
 from airflow.utils.dates import days_ago
 
 default_args = {
@@ -110,7 +111,7 @@ try:
             task_id="task_with_template",
             python_callable=print_stuff,
             executor_config={
-                "pod_template_file": "/usr/local/airflow/pod_templates/basic_template.yaml",
+                "pod_template_file": os.path.join(AIRFLOW_HOME, "pod_templates/basic_template.yaml"),
                 "pod_override": k8s.V1Pod(metadata=k8s.V1ObjectMeta(labels={"release": "stable"})),
             },
         )
diff --git a/airflow/executors/kubernetes_executor.py b/airflow/executors/kubernetes_executor.py
index 7e3d82b..ec7cbf7 100644
--- a/airflow/executors/kubernetes_executor.py
+++ b/airflow/executors/kubernetes_executor.py
@@ -496,7 +496,7 @@ class KubernetesExecutor(BaseExecutor, LoggingMixin):
             return
 
         if executor_config:
-            pod_template_file = executor_config.get("pod_template_override", None)
+            pod_template_file = executor_config.get("pod_template_file", None)
         else:
             pod_template_file = None
         if not self.task_queue:
diff --git a/airflow/kubernetes_executor_templates/basic_template.yaml b/airflow/kubernetes_executor_templates/basic_template.yaml
index a953867..a6eb83f 100644
--- a/airflow/kubernetes_executor_templates/basic_template.yaml
+++ b/airflow/kubernetes_executor_templates/basic_template.yaml
@@ -69,8 +69,8 @@ spec:
         defaultMode: 420
   restartPolicy: Never
   terminationGracePeriodSeconds: 30
-  serviceAccountName: airflow-worker-serviceaccount
-  serviceAccount: airflow-worker-serviceaccount
+  serviceAccountName: airflow-worker
+  serviceAccount: airflow-worker
   securityContext:
     runAsUser: 50000
     fsGroup: 50000
diff --git a/docs/apache-airflow/executor/kubernetes.rst b/docs/apache-airflow/executor/kubernetes.rst
index 217a29c..61d13f4 100644
--- a/docs/apache-airflow/executor/kubernetes.rst
+++ b/docs/apache-airflow/executor/kubernetes.rst
@@ -125,7 +125,7 @@ name ``base`` and a second container containing your desired sidecar.
     :end-before: [END task_with_sidecar]
 
 You can also create custom ``pod_template_file`` on a per-task basis so that you can recycle the same base values between multiple tasks.
-This will replace the default ``pod_template_file`` named in the airflow.cfg and then override that template using the ``pod_override_spec``.
+This will replace the default ``pod_template_file`` named in the airflow.cfg and then override that template using the ``pod_override``.
 
 Here is an example of a task with both features:
 
diff --git a/tests/executors/kubernetes_executor_template_files/basic_template.yaml b/tests/executors/kubernetes_executor_template_files/basic_template.yaml
new file mode 100644
index 0000000..1fb00f2
--- /dev/null
+++ b/tests/executors/kubernetes_executor_template_files/basic_template.yaml
@@ -0,0 +1,34 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+---
+kind: Pod
+apiVersion: v1
+metadata:
+  name: dummy-name-dont-delete
+  namespace: dummy-name-dont-delete
+  labels:
+    mylabel: foo
+spec:
+  containers:
+    - name: base
+      image: dummy-name-dont-delete
+  securityContext:
+    runAsUser: 50000
+    fsGroup: 50000
+  imagePullSecrets:
+    - name: airflow-registry
+  schedulerName: default-scheduler
diff --git a/tests/executors/test_kubernetes_executor.py b/tests/executors/test_kubernetes_executor.py
index 68b0006..8d3d5b4 100644
--- a/tests/executors/test_kubernetes_executor.py
+++ b/tests/executors/test_kubernetes_executor.py
@@ -15,6 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 #
+import pathlib
 import random
 import re
 import string
@@ -22,6 +23,7 @@ import unittest
 from datetime import datetime
 from unittest import mock
 
+import pytest
 from kubernetes.client import models as k8s
 from urllib3 import HTTPResponse
 
@@ -39,7 +41,7 @@ try:
         get_base_pod_from_template,
     )
     from airflow.kubernetes import pod_generator
-    from airflow.kubernetes.pod_generator import PodGenerator
+    from airflow.kubernetes.pod_generator import PodGenerator, datetime_to_label_safe_datestring
     from airflow.utils.state import State
 except ImportError:
     AirflowKubernetesScheduler = None  # type: ignore
@@ -215,6 +217,93 @@ class TestKubernetesExecutor(unittest.TestCase):
 
         assert list(executor.event_buffer.values())[0][1] == "Invalid executor_config passed"
 
+    @pytest.mark.execution_timeout(10)
+    @unittest.skipIf(AirflowKubernetesScheduler is None, 'kubernetes python package is not installed')
+    @mock.patch('airflow.kubernetes.pod_launcher.PodLauncher.run_pod_async')
+    @mock.patch('airflow.executors.kubernetes_executor.get_kube_client')
+    def test_pod_template_file_override_in_executor_config(self, mock_get_kube_client, mock_run_pod_async):
+        current_folder = pathlib.Path(__file__).parent.absolute()
+        template_file = str(
+            (current_folder / "kubernetes_executor_template_files" / "basic_template.yaml").absolute()
+        )
+
+        mock_kube_client = mock.patch('kubernetes.client.CoreV1Api', autospec=True)
+        mock_get_kube_client.return_value = mock_kube_client
+
+        with conf_vars({('kubernetes', 'pod_template_file'): ''}):
+            executor = self.kubernetes_executor
+            executor.start()
+
+            assert executor.event_buffer == {}
+            assert executor.task_queue.empty()
+
+            execution_date = datetime.utcnow()
+
+            executor.execute_async(
+                key=('dag', 'task', execution_date, 1),
+                queue=None,
+                command=['airflow', 'tasks', 'run', 'true', 'some_parameter'],
+                executor_config={
+                    "pod_template_file": template_file,
+                    "pod_override": k8s.V1Pod(
+                        metadata=k8s.V1ObjectMeta(labels={"release": "stable"}),
+                        spec=k8s.V1PodSpec(
+                            containers=[k8s.V1Container(name="base", image="airflow:3.6")],
+                        ),
+                    ),
+                },
+            )
+
+            assert not executor.task_queue.empty()
+            task = executor.task_queue.get_nowait()
+            _, _, expected_executor_config, expected_pod_template_file = task
+
+            # Test that the correct values have been put to queue
+            assert expected_executor_config.metadata.labels == {'release': 'stable'}
+            assert expected_pod_template_file == template_file
+
+            self.kubernetes_executor.kube_scheduler.run_next(task)
+            mock_run_pod_async.assert_called_once_with(
+                k8s.V1Pod(
+                    api_version="v1",
+                    kind="Pod",
+                    metadata=k8s.V1ObjectMeta(
+                        name=mock.ANY,
+                        namespace="default",
+                        annotations={
+                            'dag_id': 'dag',
+                            'execution_date': execution_date.isoformat(),
+                            'task_id': 'task',
+                            'try_number': '1',
+                        },
+                        labels={
+                            'airflow-worker': '5',
+                            'airflow_version': mock.ANY,
+                            'dag_id': 'dag',
+                            'execution_date': datetime_to_label_safe_datestring(execution_date),
+                            'kubernetes_executor': 'True',
+                            'mylabel': 'foo',
+                            'release': 'stable',
+                            'task_id': 'task',
+                            'try_number': '1',
+                        },
+                    ),
+                    spec=k8s.V1PodSpec(
+                        containers=[
+                            k8s.V1Container(
+                                name="base",
+                                image="airflow:3.6",
+                                args=['airflow', 'tasks', 'run', 'true', 'some_parameter'],
+                                env=[k8s.V1EnvVar(name='AIRFLOW_IS_K8S_EXECUTOR_POD', value='True')],
+                            )
+                        ],
+                        image_pull_secrets=[k8s.V1LocalObjectReference(name='airflow-registry')],
+                        scheduler_name='default-scheduler',
+                        security_context=k8s.V1PodSecurityContext(fs_group=50000, run_as_user=50000),
+                    ),
+                )
+            )
+
     @mock.patch('airflow.executors.kubernetes_executor.KubernetesJobWatcher')
     @mock.patch('airflow.executors.kubernetes_executor.get_kube_client')
     def test_change_state_running(self, mock_get_kube_client, mock_kubernetes_job_watcher):

[airflow] 17/36: Fix Providers doc (#15185)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 1f7e3646904644cf57c7258fc18620f192d9f3ed
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Sun Apr 4 16:12:58 2021 +0100

    Fix Providers doc (#15185)
    
    `pip pip install -e /path/to/my-package` -> `pip install -e /path/to/my-package`
    
    (cherry picked from commit dc969685011938f0fb692c36918a7f0feb26472a)
---
 docs/apache-airflow-providers/index.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/apache-airflow-providers/index.rst b/docs/apache-airflow-providers/index.rst
index 222112f..81e02fa 100644
--- a/docs/apache-airflow-providers/index.rst
+++ b/docs/apache-airflow-providers/index.rst
@@ -248,7 +248,7 @@ When running airflow with your provider package, there will be (at least) three
   together with the related files (e.g. ``dags`` folder)
 * The ``apache-airflow`` package
 * Your own ``myproviderpackage`` package that is independent of ``apache-airflow`` or your airflow installation, which
-  can be a local Python package (that you install via ``pip pip install -e /path/to/my-package``), a normal pip package
+  can be a local Python package (that you install via ``pip install -e /path/to/my-package``), a normal pip package
   (``pip install myproviderpackage``), or any other type of Python package
 
 In the ``myproviderpackage`` package you need to add the entry point and provide the appropriate metadata as described above.

[airflow] 08/36: Less docker magic in docs building (#15176)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit b61feb61233c2b2f11dfc48eb85ba3370d4f6b6d
Author: Kamil Breguła <mi...@users.noreply.github.com>
AuthorDate: Tue Apr 6 03:10:42 2021 +0200

    Less docker magic in docs building (#15176)
    
    (cherry picked from commit 3bd11631ff0fbff4859452513efe03674b04b141)
---
 .github/workflows/ci.yml                   |  12 --
 docs/build_docs.py                         |  91 +++------------
 docs/conf.py                               |   2 +-
 docs/exts/docs_build/code_utils.py         |  16 +--
 docs/exts/docs_build/docs_builder.py       | 171 +++++++----------------------
 docs/exts/docs_build/errors.py             |   6 +-
 docs/exts/docs_build/run_patched_sphinx.py | 105 ++++++++++++++++++
 docs/exts/docs_build/spelling_checks.py    |   6 +-
 docs/exts/provider_init_hack.py            |  10 +-
 scripts/ci/docs/ci_docs.sh                 |  15 +--
 10 files changed, 170 insertions(+), 264 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index dc98f5c..86bc960 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -463,24 +463,12 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
     env:
       RUNS_ON: ${{ fromJson(needs.build-info.outputs.runsOn) }}
       GITHUB_REGISTRY: ${{ needs.ci-images.outputs.githubRegistry }}
-      PYTHON_MAJOR_MINOR_VERSION: ${{needs.build-info.outputs.defaultPythonVersion}}
     steps:
       - name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
         uses: actions/checkout@v2
         with:
           persist-credentials: false
           submodules: recursive
-      - name: "Setup python"
-        uses: actions/setup-python@v2
-        with:
-          python-version: ${{needs.build-info.outputs.defaultPythonVersion}}
-      - uses: actions/cache@v2
-        id: cache-venv-docs
-        with:
-          path: ./.docs-venv/
-          key: ${{ runner.os }}-docs-venv-${{ hashFiles('setup.py', 'setup.cfg') }}
-          restore-keys: |
-            ${{ runner.os }}-docs-venv-
       - name: "Free space"
         run: ./scripts/ci/tools/ci_free_space_on_ci.sh
       - name: "Prepare CI image ${{env.PYTHON_MAJOR_MINOR_VERSION}}:${{ env.GITHUB_REGISTRY_PULL_IMAGE_TAG }}"
diff --git a/docs/build_docs.py b/docs/build_docs.py
index 59e1681..5f1a534 100755
--- a/docs/build_docs.py
+++ b/docs/build_docs.py
@@ -18,23 +18,15 @@
 import argparse
 import multiprocessing
 import os
-import platform
 import sys
 from collections import defaultdict
-from subprocess import run
 from typing import Dict, List, NamedTuple, Optional, Tuple
 
 from rich.console import Console
 from tabulate import tabulate
 
 from docs.exts.docs_build import dev_index_generator, lint_checks  # pylint: disable=no-name-in-module
-from docs.exts.docs_build.code_utils import (
-    CONSOLE_WIDTH,
-    DOCKER_PROJECT_DIR,
-    ROOT_PROJECT_DIR,
-    TEXT_RED,
-    TEXT_RESET,
-)
+from docs.exts.docs_build.code_utils import CONSOLE_WIDTH, PROVIDER_INIT_FILE, TEXT_RED, TEXT_RESET
 from docs.exts.docs_build.docs_builder import (  # pylint: disable=no-name-in-module
     DOCS_DIR,
     AirflowDocsBuilder,
@@ -52,7 +44,7 @@ from docs.exts.docs_build.spelling_checks import (  # pylint: disable=no-name-in
     display_spelling_error_summary,
 )
 
-if __name__ != "__main__":
+if __name__ not in ("__main__", "__mp_main__"):
     raise SystemExit(
         "This file is intended to be executed as an executable program. You cannot use it as a module."
         "To run this script, run the ./build_docs.py command"
@@ -131,27 +123,13 @@ def _get_parser():
         "--jobs",
         dest='jobs',
         type=int,
-        default=1,
+        default=0,
         help=(
-            """
-    Number of parallel processes that will be spawned to build the docs.
-
-    This is usually used in CI system only. Though you can also use it to run complete check
-    of the documntation locally if you have powerful local machine.
-    Default is 1 - which means that doc check runs sequentially, This is the default behaviour
-    because autoapi extension we use is not capable of running parallel builds at the same time using
-    the same source files.
-
-    In parallel builds we are using dockerised version of image built from local sources but the image
-    has to be prepared locally (similarly as it is in CI) before you run the docs build. Any changes you
-    have done locally after building the image, will not be checked.
-
-    Typically you run parallel build in this way if you want to quickly run complete check for all docs:
+            """\
+        Number of parallel processes that will be spawned to build the docs.
 
-         ./breeze build-image --python 3.6
-         ./docs/build-docs.py -j 0
-
-"""
+        If passed 0, the value will be determined based on the number of CPUs.
+        """
         ),
     )
     parser.add_argument(
@@ -174,7 +152,6 @@ class BuildSpecification(NamedTuple):
     package_name: str
     for_production: bool
     verbose: bool
-    dockerized: bool
 
 
 class BuildDocsResult(NamedTuple):
@@ -202,7 +179,6 @@ def perform_docs_build_for_single_package(build_specification: BuildSpecificatio
     result = BuildDocsResult(
         package_name=build_specification.package_name,
         errors=builder.build_sphinx_docs(
-            dockerized=build_specification.dockerized,
             verbose=build_specification.verbose,
         ),
         log_file_name=builder.log_build_filename,
@@ -219,7 +195,6 @@ def perform_spell_check_for_single_package(build_specification: BuildSpecificati
     result = SpellCheckResult(
         package_name=build_specification.package_name,
         errors=builder.check_spelling(
-            dockerized=build_specification.dockerized,
             verbose=build_specification.verbose,
         ),
         log_file_name=builder.log_spelling_filename,
@@ -245,11 +220,6 @@ def build_docs_for_packages(
             builder = AirflowDocsBuilder(package_name=package_name, for_production=for_production)
             builder.clean_files()
     if jobs > 1:
-        if os.getenv('CI', '') == '':
-            console.print("[yellow] PARALLEL DOCKERIZED EXECUTION REQUIRES IMAGE TO BE BUILD BEFORE !!!![/]")
-            console.print("[yellow] Make sure that you've build the image before runnning docs build.[/]")
-            console.print("[yellow] otherwise local changes you've done will not be used during the check[/]")
-            console.print()
         run_in_parallel(
             all_build_errors,
             all_spelling_errors,
@@ -289,7 +259,6 @@ def run_sequentially(
                 build_specification=BuildSpecification(
                     package_name=package_name,
                     for_production=for_production,
-                    dockerized=False,
                     verbose=verbose,
                 )
             )
@@ -302,7 +271,6 @@ def run_sequentially(
                 build_specification=BuildSpecification(
                     package_name=package_name,
                     for_production=for_production,
-                    dockerized=False,
                     verbose=verbose,
                 )
             )
@@ -323,15 +291,12 @@ def run_in_parallel(
 ):
     """Run both - spellcheck and docs build sequentially without multiprocessing"""
     pool = multiprocessing.Pool(processes=jobs)
-    # until we fix autoapi, we need to run parallel builds as dockerized images
-    dockerized = True
     if not spellcheck_only:
         run_docs_build_in_parallel(
             all_build_errors=all_build_errors,
             for_production=for_production,
             current_packages=current_packages,
             verbose=verbose,
-            dockerized=dockerized,
             pool=pool,
         )
     if not docs_only:
@@ -340,34 +305,8 @@ def run_in_parallel(
             for_production=for_production,
             current_packages=current_packages,
             verbose=verbose,
-            dockerized=dockerized,
             pool=pool,
         )
-    fix_ownership()
-
-
-def fix_ownership():
-    """Fixes ownership for all files created with root user,"""
-    console.print("Fixing ownership for generated files")
-    python_version = os.getenv('PYTHON_MAJOR_MINOR_VERSION', "3.6")
-    fix_cmd = [
-        "docker",
-        "run",
-        "--entrypoint",
-        "/bin/bash",
-        "--rm",
-        "-e",
-        f"HOST_OS={platform.system()}",
-        "-e" f"HOST_USER_ID={os.getuid()}",
-        "-e",
-        f"HOST_GROUP_ID={os.getgid()}",
-        "-v",
-        f"{ROOT_PROJECT_DIR}:{DOCKER_PROJECT_DIR}",
-        f"apache/airflow:master-python{python_version}-ci",
-        "-c",
-        "/opt/airflow/scripts/in_container/run_fix_ownership.sh",
-    ]
-    run(fix_cmd, check=True)
 
 
 def print_build_output(result: BuildDocsResult):
@@ -386,7 +325,6 @@ def run_docs_build_in_parallel(
     for_production: bool,
     current_packages: List[str],
     verbose: bool,
-    dockerized: bool,
     pool,
 ):
     """Runs documentation building in parallel."""
@@ -399,7 +337,6 @@ def run_docs_build_in_parallel(
                     package_name=package_name,
                     for_production=for_production,
                     verbose=verbose,
-                    dockerized=dockerized,
                 )
             )
     with with_group("Running docs building"):
@@ -428,7 +365,6 @@ def run_spell_check_in_parallel(
     for_production: bool,
     current_packages: List[str],
     verbose: bool,
-    dockerized: bool,
     pool,
 ):
     """Runs spell check in parallel."""
@@ -437,12 +373,7 @@ def run_spell_check_in_parallel(
         for package_name in current_packages:
             console.print(f"[blue]{package_name:60}:[/] Scheduling spellchecking")
             spell_check_specifications.append(
-                BuildSpecification(
-                    package_name=package_name,
-                    for_production=for_production,
-                    verbose=verbose,
-                    dockerized=dockerized,
-                )
+                BuildSpecification(package_name=package_name, for_production=for_production, verbose=verbose)
             )
     with with_group("Running spell checking of documentation"):
         console.print()
@@ -572,10 +503,14 @@ def main():
     if not package_filters:
         _promote_new_flags()
 
+    if os.path.exists(PROVIDER_INIT_FILE):
+        os.remove(PROVIDER_INIT_FILE)
+
     print_build_errors_and_exit(
         all_build_errors,
         all_spelling_errors,
     )
 
 
-main()
+if __name__ == "__main__":
+    main()
diff --git a/docs/conf.py b/docs/conf.py
index 678f053..11708f9 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -506,7 +506,7 @@ autoapi_keep_files = True
 
 # Relative path to output the AutoAPI files into. This can also be used to place the generated documentation
 # anywhere in your documentation hierarchy.
-autoapi_root = f'{PACKAGE_NAME}/_api'
+autoapi_root = '_api'
 
 # Whether to insert the generated documentation into the TOC tree. If this is False, the default AutoAPI
 # index page is not generated and you will need to include the generated documentation in a
diff --git a/docs/exts/docs_build/code_utils.py b/docs/exts/docs_build/code_utils.py
index 5c88797..adab5c2 100644
--- a/docs/exts/docs_build/code_utils.py
+++ b/docs/exts/docs_build/code_utils.py
@@ -22,12 +22,10 @@ from docs.exts.provider_yaml_utils import load_package_data
 ROOT_PROJECT_DIR = os.path.abspath(
     os.path.join(os.path.dirname(os.path.realpath(__file__)), os.pardir, os.pardir, os.pardir)
 )
+PROVIDER_INIT_FILE = os.path.join(ROOT_PROJECT_DIR, "airflow", "providers", "__init__.py")
 DOCS_DIR = os.path.join(ROOT_PROJECT_DIR, "docs")
 AIRFLOW_DIR = os.path.join(ROOT_PROJECT_DIR, "airflow")
 
-DOCKER_PROJECT_DIR = "/opt/airflow"
-DOCKER_DOCS_DIR = os.path.join(DOCKER_PROJECT_DIR, "docs")
-DOCKER_AIRFLOW_DIR = os.path.join(DOCKER_PROJECT_DIR, "/airflow")
 ALL_PROVIDER_YAMLS = load_package_data()
 AIRFLOW_SITE_DIR = os.environ.get('AIRFLOW_SITE_DIRECTORY')
 PROCESS_TIMEOUT = 8 * 60  # 400 seconds
@@ -38,18 +36,6 @@ TEXT_RESET = '\033[0m'
 CONSOLE_WIDTH = 180
 
 
-def remap_from_docker(file_name: str, dockerized: bool):
-    """
-    Remaps filename from Docker to Host.
-    :param file_name: name of file
-    :param dockerized: whether builds were running in docker environment.
-    :return:
-    """
-    if dockerized and file_name.startswith(DOCKER_PROJECT_DIR):
-        return file_name.replace(DOCKER_PROJECT_DIR, ROOT_PROJECT_DIR)
-    return file_name
-
-
 def prepare_code_snippet(file_path: str, line_no: int, context_lines_count: int = 5) -> str:
     """
     Prepares code snippet.
diff --git a/docs/exts/docs_build/docs_builder.py b/docs/exts/docs_build/docs_builder.py
index 0669c75..669d76d 100644
--- a/docs/exts/docs_build/docs_builder.py
+++ b/docs/exts/docs_build/docs_builder.py
@@ -28,9 +28,9 @@ from docs.exts.docs_build.code_utils import (
     AIRFLOW_SITE_DIR,
     ALL_PROVIDER_YAMLS,
     CONSOLE_WIDTH,
-    DOCKER_DOCS_DIR,
     DOCS_DIR,
     PROCESS_TIMEOUT,
+    ROOT_PROJECT_DIR,
     pretty_format_path,
 )
 from docs.exts.docs_build.errors import DocBuildError, parse_sphinx_warnings
@@ -55,18 +55,10 @@ class AirflowDocsBuilder:
         return f"{DOCS_DIR}/_doctrees/docs/{self.package_name}"
 
     @property
-    def _docker_doctree_dir(self) -> str:
-        return f"{DOCKER_DOCS_DIR}/_doctrees/docs/{self.package_name}"
-
-    @property
     def _inventory_cache_dir(self) -> str:
         return f"{DOCS_DIR}/_inventory_cache"
 
     @property
-    def _docker_inventory_cache_dir(self) -> str:
-        return f"{DOCKER_DOCS_DIR}/_inventory_cache"
-
-    @property
     def is_versioned(self):
         """Is current documentation package versioned?"""
         # Disable versioning. This documentation does not apply to any released product and we can update
@@ -87,49 +79,21 @@ class AirflowDocsBuilder:
         return os.path.join(self._build_dir, f"output-spelling-{self.package_name}.log")
 
     @property
-    def docker_log_spelling_filename(self) -> str:
-        """Log from spelling job in docker."""
-        return os.path.join(self._docker_build_dir, f"output-spelling-{self.package_name}.log")
-
-    @property
     def log_spelling_output_dir(self) -> str:
         """Results from spelling job."""
         return os.path.join(self._build_dir, f"output-spelling-results-{self.package_name}")
 
     @property
-    def docker_log_spelling_output_dir(self) -> str:
-        """Results from spelling job in docker."""
-        return os.path.join(self._docker_build_dir, f"output-spelling-results-{self.package_name}")
-
-    @property
     def log_build_filename(self) -> str:
         """Log from build job."""
         return os.path.join(self._build_dir, f"output-build-{self.package_name}.log")
 
     @property
-    def docker_log_build_filename(self) -> str:
-        """Log from build job in docker."""
-        return os.path.join(self._docker_build_dir, f"output-build-{self.package_name}.log")
-
-    @property
     def log_build_warning_filename(self) -> str:
         """Warnings from build job."""
         return os.path.join(self._build_dir, f"warning-build-{self.package_name}.log")
 
     @property
-    def docker_log_warning_filename(self) -> str:
-        """Warnings from build job in docker."""
-        return os.path.join(self._docker_build_dir, f"warning-build-{self.package_name}.log")
-
-    @property
-    def _docker_build_dir(self) -> str:
-        if self.is_versioned:
-            version = "stable" if self.for_production else "latest"
-            return f"{DOCKER_DOCS_DIR}/_build/docs/{self.package_name}/{version}"
-        else:
-            return f"{DOCKER_DOCS_DIR}/_build/docs/{self.package_name}"
-
-    @property
     def _current_version(self):
         if not self.is_versioned:
             raise Exception("This documentation package is not versioned")
@@ -153,10 +117,6 @@ class AirflowDocsBuilder:
     def _src_dir(self) -> str:
         return f"{DOCS_DIR}/{self.package_name}"
 
-    @property
-    def _docker_src_dir(self) -> str:
-        return f"{DOCKER_DOCS_DIR}/{self.package_name}"
-
     def clean_files(self) -> None:
         """Cleanup all artifacts generated by previous builds."""
         api_dir = os.path.join(self._src_dir, "_api")
@@ -166,58 +126,33 @@ class AirflowDocsBuilder:
         os.makedirs(api_dir, exist_ok=True)
         os.makedirs(self._build_dir, exist_ok=True)
 
-    def check_spelling(self, verbose: bool, dockerized: bool) -> List[SpellingError]:
+    def check_spelling(self, verbose: bool) -> List[SpellingError]:
         """
         Checks spelling
 
         :param verbose: whether to show output while running
-        :param dockerized: whether to run dockerized build (required for paralllel processing on CI)
         :return: list of errors
         """
         spelling_errors = []
         os.makedirs(self._build_dir, exist_ok=True)
         shutil.rmtree(self.log_spelling_output_dir, ignore_errors=True)
         os.makedirs(self.log_spelling_output_dir, exist_ok=True)
-        if dockerized:
-            python_version = os.getenv('PYTHON_MAJOR_MINOR_VERSION', "3.6")
-            build_cmd = [
-                "docker",
-                "run",
-                "--rm",
-                "-e",
-                "AIRFLOW_FOR_PRODUCTION",
-                "-e",
-                "AIRFLOW_PACKAGE_NAME",
-                "-v",
-                f"{self._build_dir}:{self._docker_build_dir}",
-                "-v",
-                f"{self._inventory_cache_dir}:{self._docker_inventory_cache_dir}",
-                "-w",
-                DOCKER_DOCS_DIR,
-                f"apache/airflow:master-python{python_version}-ci",
-                "/opt/airflow/scripts/in_container/run_anything.sh",
-            ]
-        else:
-            build_cmd = []
-
-        build_cmd.extend(
-            [
-                "sphinx-build",
-                "-W",  # turn warnings into errors
-                "--color",  # do emit colored output
-                "-T",  # show full traceback on exception
-                "-b",  # builder to use
-                "spelling",
-                "-c",
-                DOCS_DIR if not dockerized else DOCKER_DOCS_DIR,
-                "-d",  # path for the cached environment and doctree files
-                self._doctree_dir if not dockerized else self._docker_doctree_dir,
-                self._src_dir
-                if not dockerized
-                else self._docker_src_dir,  # path to documentation source files
-                self.log_spelling_output_dir if not dockerized else self.docker_log_spelling_output_dir,
-            ]
-        )
+
+        build_cmd = [
+            os.path.join(ROOT_PROJECT_DIR, "docs", "exts", "docs_build", "run_patched_sphinx.py"),
+            "-W",  # turn warnings into errors
+            "--color",  # do emit colored output
+            "-T",  # show full traceback on exception
+            "-b",  # builder to use
+            "spelling",
+            "-c",
+            DOCS_DIR,
+            "-d",  # path for the cached environment and doctree files
+            self._doctree_dir,
+            self._src_dir,  # path to documentation source files
+            self.log_spelling_output_dir,
+        ]
+
         env = os.environ.copy()
         env['AIRFLOW_PACKAGE_NAME'] = self.package_name
         if self.for_production:
@@ -246,7 +181,7 @@ class AirflowDocsBuilder:
                     suggestion=None,
                     context_line=None,
                     message=(
-                        f"Sphinx spellcheck returned non-zero exit status: " f"{completed_proc.returncode}."
+                        f"Sphinx spellcheck returned non-zero exit status: {completed_proc.returncode}."
                     ),
                 )
             )
@@ -254,69 +189,45 @@ class AirflowDocsBuilder:
             for filepath in glob(f"{self.log_spelling_output_dir}/**/*.spelling", recursive=True):
                 with open(filepath) as spelling_file:
                     warning_text += spelling_file.read()
-            spelling_errors.extend(parse_spelling_warnings(warning_text, self._src_dir, dockerized))
+
+            spelling_errors.extend(parse_spelling_warnings(warning_text, self._src_dir))
             console.print(f"[blue]{self.package_name:60}:[/] [red]Finished spell-checking with errors[/]")
         else:
             if spelling_errors:
                 console.print(
-                    f"[blue]{self.package_name:60}:[/] [yellow]Finished spell-checking " f"with warnings[/]"
+                    f"[blue]{self.package_name:60}:[/] [yellow]Finished spell-checking with warnings[/]"
                 )
             else:
                 console.print(
-                    f"[blue]{self.package_name:60}:[/] [green]Finished spell-checking " f"successfully[/]"
+                    f"[blue]{self.package_name:60}:[/] [green]Finished spell-checking successfully[/]"
                 )
         return spelling_errors
 
-    def build_sphinx_docs(self, verbose: bool, dockerized: bool) -> List[DocBuildError]:
+    def build_sphinx_docs(self, verbose: bool) -> List[DocBuildError]:
         """
         Build Sphinx documentation.
 
         :param verbose: whether to show output while running
-        :param dockerized: whether to run dockerized build (required for paralllel processing on CI)
         :return: list of errors
         """
         build_errors = []
         os.makedirs(self._build_dir, exist_ok=True)
-        if dockerized:
-            python_version = os.getenv('PYTHON_MAJOR_MINOR_VERSION', "3.6")
-            build_cmd = [
-                "docker",
-                "run",
-                "--rm",
-                "-e",
-                "AIRFLOW_FOR_PRODUCTION",
-                "-e",
-                "AIRFLOW_PACKAGE_NAME",
-                "-v",
-                f"{self._build_dir}:{self._docker_build_dir}",
-                "-v",
-                f"{self._inventory_cache_dir}:{self._docker_inventory_cache_dir}",
-                "-w",
-                DOCKER_DOCS_DIR,
-                f"apache/airflow:master-python{python_version}-ci",
-                "/opt/airflow/scripts/in_container/run_anything.sh",
-            ]
-        else:
-            build_cmd = []
-        build_cmd.extend(
-            [
-                "sphinx-build",
-                "-T",  # show full traceback on exception
-                "--color",  # do emit colored output
-                "-b",  # builder to use
-                "html",
-                "-d",  # path for the cached environment and doctree files
-                self._doctree_dir if not dockerized else self._docker_doctree_dir,
-                "-c",
-                DOCS_DIR if not dockerized else DOCKER_DOCS_DIR,
-                "-w",  # write warnings (and errors) to given file
-                self.log_build_warning_filename if not dockerized else self.docker_log_warning_filename,
-                self._src_dir
-                if not dockerized
-                else self._docker_src_dir,  # path to documentation source files
-                self._build_dir if not dockerized else self._docker_build_dir,  # path to output directory
-            ]
-        )
+
+        build_cmd = [
+            os.path.join(ROOT_PROJECT_DIR, "docs", "exts", "docs_build", "run_patched_sphinx.py"),
+            "-T",  # show full traceback on exception
+            "--color",  # do emit colored output
+            "-b",  # builder to use
+            "html",
+            "-d",  # path for the cached environment and doctree files
+            self._doctree_dir,
+            "-c",
+            DOCS_DIR,
+            "-w",  # write warnings (and errors) to given file
+            self.log_build_warning_filename,
+            self._src_dir,
+            self._build_dir,  # path to output directory
+        ]
         env = os.environ.copy()
         env['AIRFLOW_PACKAGE_NAME'] = self.package_name
         if self.for_production:
@@ -353,7 +264,7 @@ class AirflowDocsBuilder:
                 warning_text = warning_file.read()
             # Remove 7-bit C1 ANSI escape sequences
             warning_text = re.sub(r"\x1B[@-_][0-?]*[ -/]*[@-~]", "", warning_text)
-            build_errors.extend(parse_sphinx_warnings(warning_text, self._src_dir, dockerized))
+            build_errors.extend(parse_sphinx_warnings(warning_text, self._src_dir))
         if build_errors:
             console.print(f"[blue]{self.package_name:60}:[/] [red]Finished docs building with errors[/]")
         else:
diff --git a/docs/exts/docs_build/errors.py b/docs/exts/docs_build/errors.py
index 954262d..1a2ae06 100644
--- a/docs/exts/docs_build/errors.py
+++ b/docs/exts/docs_build/errors.py
@@ -21,7 +21,7 @@ from typing import Dict, List, NamedTuple, Optional
 from rich.console import Console
 
 from airflow.utils.code_utils import prepare_code_snippet
-from docs.exts.docs_build.code_utils import CONSOLE_WIDTH, remap_from_docker
+from docs.exts.docs_build.code_utils import CONSOLE_WIDTH
 
 CURRENT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__)))
 DOCS_DIR = os.path.abspath(os.path.join(CURRENT_DIR, os.pardir, os.pardir))
@@ -82,7 +82,7 @@ def display_errors_summary(build_errors: Dict[str, List[DocBuildError]]) -> None
     console.print()
 
 
-def parse_sphinx_warnings(warning_text: str, docs_dir: str, dockerized: bool) -> List[DocBuildError]:
+def parse_sphinx_warnings(warning_text: str, docs_dir: str) -> List[DocBuildError]:
     """
     Parses warnings from Sphinx.
 
@@ -98,7 +98,7 @@ def parse_sphinx_warnings(warning_text: str, docs_dir: str, dockerized: bool) ->
             try:
                 sphinx_build_errors.append(
                     DocBuildError(
-                        file_path=remap_from_docker(os.path.join(docs_dir, warning_parts[0]), dockerized),
+                        file_path=os.path.join(docs_dir, warning_parts[0]),
                         line_no=int(warning_parts[1]),
                         message=warning_parts[2],
                     )
diff --git a/docs/exts/docs_build/run_patched_sphinx.py b/docs/exts/docs_build/run_patched_sphinx.py
new file mode 100755
index 0000000..887b982
--- /dev/null
+++ b/docs/exts/docs_build/run_patched_sphinx.py
@@ -0,0 +1,105 @@
+#!/usr/bin/env python
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import os
+import sys
+
+import autoapi
+from autoapi.extension import (
+    LOGGER,
+    ExtensionError,
+    bold,
+    darkgreen,
+    default_backend_mapping,
+    default_file_mapping,
+    default_ignore_patterns,
+)
+from sphinx.cmd.build import main
+
+
+def run_autoapi(app):
+    """Load AutoAPI data from the filesystem."""
+    if not app.config.autoapi_dirs:
+        raise ExtensionError("You must configure an autoapi_dirs setting")
+
+    # Make sure the paths are full
+    normalized_dirs = []
+    autoapi_dirs = app.config.autoapi_dirs
+    if isinstance(autoapi_dirs, str):
+        autoapi_dirs = [autoapi_dirs]
+    for path in autoapi_dirs:
+        if os.path.isabs(path):
+            normalized_dirs.append(path)
+        else:
+            normalized_dirs.append(os.path.normpath(os.path.join(app.confdir, path)))
+
+    for _dir in normalized_dirs:
+        if not os.path.exists(_dir):
+            raise ExtensionError(
+                "AutoAPI Directory `{dir}` not found. "
+                "Please check your `autoapi_dirs` setting.".format(dir=_dir)
+            )
+
+    # Change from app.confdir to app.srcdir.
+    # Before:
+    # - normalized_root = os.path.normpath(
+    # -    os.path.join(app.confdir, app.config.autoapi_root)
+    # -)
+    normalized_root = os.path.normpath(os.path.join(app.srcdir, app.config.autoapi_root))
+    url_root = os.path.join("/", app.config.autoapi_root)
+    sphinx_mapper = default_backend_mapping[app.config.autoapi_type]
+    sphinx_mapper_obj = sphinx_mapper(app, template_dir=app.config.autoapi_template_dir, url_root=url_root)
+    app.env.autoapi_mapper = sphinx_mapper_obj
+
+    if app.config.autoapi_file_patterns:
+        file_patterns = app.config.autoapi_file_patterns
+    else:
+        file_patterns = default_file_mapping.get(app.config.autoapi_type, [])
+
+    if app.config.autoapi_ignore:
+        ignore_patterns = app.config.autoapi_ignore
+    else:
+        ignore_patterns = default_ignore_patterns.get(app.config.autoapi_type, [])
+
+    if ".rst" in app.config.source_suffix:
+        out_suffix = ".rst"
+    elif ".txt" in app.config.source_suffix:
+        out_suffix = ".txt"
+    else:
+        # Fallback to first suffix listed
+        out_suffix = app.config.source_suffix[0]
+
+    # Actual meat of the run.
+    LOGGER.info(bold("[AutoAPI] ") + darkgreen("Loading Data"))
+    sphinx_mapper_obj.load(patterns=file_patterns, dirs=normalized_dirs, ignore=ignore_patterns)
+
+    LOGGER.info(bold("[AutoAPI] ") + darkgreen("Mapping Data"))
+    sphinx_mapper_obj.map(options=app.config.autoapi_options)
+
+    if app.config.autoapi_generate_api_docs:
+        LOGGER.info(bold("[AutoAPI] ") + darkgreen("Rendering Data"))
+        sphinx_mapper_obj.output_rst(root=normalized_root, source_suffix=out_suffix)
+
+
+# HACK: sphinx-auto map did not correctly use the confdir attribute instead of srcdir when specifying the
+# directory to contain the generated files.
+# Unfortunately we have a problem updating to a newer version of this library and we have to use
+# sphinx-autoapi v1.0.0, so I am monkeypatching this library to fix this one problem.
+autoapi.extension.run_autoapi = run_autoapi
+
+sys.exit(main(sys.argv[1:]))
diff --git a/docs/exts/docs_build/spelling_checks.py b/docs/exts/docs_build/spelling_checks.py
index 2be9cca..4d3c26d 100644
--- a/docs/exts/docs_build/spelling_checks.py
+++ b/docs/exts/docs_build/spelling_checks.py
@@ -23,7 +23,7 @@ from typing import Dict, List, NamedTuple, Optional
 from rich.console import Console
 
 from airflow.utils.code_utils import prepare_code_snippet
-from docs.exts.docs_build.code_utils import CONSOLE_WIDTH, remap_from_docker
+from docs.exts.docs_build.code_utils import CONSOLE_WIDTH
 
 CURRENT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__)))
 DOCS_DIR = os.path.abspath(os.path.join(CURRENT_DIR, os.pardir, os.pardir))
@@ -80,7 +80,7 @@ class SpellingError(NamedTuple):
         return left < right
 
 
-def parse_spelling_warnings(warning_text: str, docs_dir: str, dockerized: bool) -> List[SpellingError]:
+def parse_spelling_warnings(warning_text: str, docs_dir: str) -> List[SpellingError]:
     """
     Parses warnings from Sphinx.
 
@@ -99,7 +99,7 @@ def parse_spelling_warnings(warning_text: str, docs_dir: str, dockerized: bool)
             try:
                 sphinx_spelling_errors.append(
                     SpellingError(
-                        file_path=remap_from_docker(os.path.join(docs_dir, warning_parts[0]), dockerized),
+                        file_path=os.path.join(docs_dir, warning_parts[0]),
                         line_no=int(warning_parts[1]) if warning_parts[1] not in ('None', '') else None,
                         spelling=warning_parts[2],
                         suggestion=warning_parts[3] if warning_parts[3] else None,
diff --git a/docs/exts/provider_init_hack.py b/docs/exts/provider_init_hack.py
index 0d88559..40f7fef 100644
--- a/docs/exts/provider_init_hack.py
+++ b/docs/exts/provider_init_hack.py
@@ -34,17 +34,12 @@ PROVIDER_INIT_FILE = os.path.join(ROOT_PROJECT_DIR, "airflow", "providers", "__i
 def _create_init_py(app, config):
     del app
     del config
+    # This file is deleted by /docs/build_docs.py. If you are not using the script, the file will be
+    # deleted by pre-commit.
     with open(PROVIDER_INIT_FILE, "wt"):
         pass
 
 
-def _delete_init_py(app, exception):
-    del app
-    del exception
-    if os.path.exists(PROVIDER_INIT_FILE):
-        os.remove(PROVIDER_INIT_FILE)
-
-
 def setup(app: Sphinx):
     """
     Sets the plugin up and returns configuration of the plugin.
@@ -53,6 +48,5 @@ def setup(app: Sphinx):
     :return json description of the configuration that is needed by the plugin.
     """
     app.connect("config-inited", _create_init_py)
-    app.connect("build-finished", _delete_init_py)
 
     return {"version": "builtin", "parallel_read_safe": True, "parallel_write_safe": True}
diff --git a/scripts/ci/docs/ci_docs.sh b/scripts/ci/docs/ci_docs.sh
index 003a8c2..be0d2ed 100755
--- a/scripts/ci/docs/ci_docs.sh
+++ b/scripts/ci/docs/ci_docs.sh
@@ -22,17 +22,4 @@ build_images::prepare_ci_build
 
 build_images::rebuild_ci_image_if_needed_with_group
 
-start_end::group_start "Preparing venv for doc building"
-
-python3 -m venv .docs-venv
-source .docs-venv/bin/activate
-export PYTHONPATH=${AIRFLOW_SOURCES}
-
-pip install --upgrade pip==20.2.4
-
-pip install .[doc] --upgrade --constraint \
-    "https://raw.githubusercontent.com/apache/airflow/constraints-${DEFAULT_BRANCH}/constraints-${PYTHON_MAJOR_MINOR_VERSION}.txt"
-
-start_end::group_end
-
-"${AIRFLOW_SOURCES}/docs/build_docs.py" -j 0 "${@}"
+runs::run_docs "${@}"

[airflow] 16/36: Replace new url for Stable Airflow Docs (#15169)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit b96a8bfb515be262fbe0b89c4930bf5f6e40d349
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Sun Apr 4 16:13:34 2021 +0100

    Replace new url for Stable Airflow Docs (#15169)
    
    `https://airflow.apache.org/docs/stable/` -> `https://airflow.apache.org/docs/apache-airflow/stable`
    
    (cherry picked from commit 64b00896d905abcf1fbae195a29b81f393319c5f)
---
 UPDATING.md                                                           | 4 ++--
 airflow/config_templates/config.yml                                   | 2 +-
 airflow/config_templates/default_airflow.cfg                          | 2 +-
 .../operators/azure_blob_to_gcs.rst                                   | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/UPDATING.md b/UPDATING.md
index d024d51..7966f58 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -204,7 +204,7 @@ from my_plugin import MyOperator
 
 The name under `airflow.operators.` was the plugin name, where as in the second example it is the python module name where the operator is defined.
 
-See https://airflow.apache.org/docs/stable/howto/custom-operator.html for more info.
+See https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html for more info.
 
 ### Importing Hooks via plugins is no longer supported
 
@@ -222,7 +222,7 @@ from my_plugin import MyHook
 
 It is still possible (but not required) to "register" hooks in plugins. This is to allow future support for dynamically populating the Connections form in the UI.
 
-See https://airflow.apache.org/docs/stable/howto/custom-operator.html for more info.
+See https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html for more info.
 
 ### Adding Operators and Sensors via plugins is no longer supported
 
diff --git a/airflow/config_templates/config.yml b/airflow/config_templates/config.yml
index c53a6ca..32694e4 100644
--- a/airflow/config_templates/config.yml
+++ b/airflow/config_templates/config.yml
@@ -718,7 +718,7 @@
     - name: auth_backend
       description: |
         How to authenticate users of the API. See
-        https://airflow.apache.org/docs/stable/security.html for possible values.
+        https://airflow.apache.org/docs/apache-airflow/stable/security.html for possible values.
         ("airflow.api.auth.backend.default" allows all requests for historic reasons)
       version_added: ~
       type: string
diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg
index 8e3ebf0..7685457 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -384,7 +384,7 @@ fail_fast = False
 enable_experimental_api = False
 
 # How to authenticate users of the API. See
-# https://airflow.apache.org/docs/stable/security.html for possible values.
+# https://airflow.apache.org/docs/apache-airflow/stable/security.html for possible values.
 # ("airflow.api.auth.backend.default" allows all requests for historic reasons)
 auth_backend = airflow.api.auth.backend.deny_all
 
diff --git a/docs/apache-airflow-providers-microsoft-azure/operators/azure_blob_to_gcs.rst b/docs/apache-airflow-providers-microsoft-azure/operators/azure_blob_to_gcs.rst
index 11daa03..7b13097 100644
--- a/docs/apache-airflow-providers-microsoft-azure/operators/azure_blob_to_gcs.rst
+++ b/docs/apache-airflow-providers-microsoft-azure/operators/azure_blob_to_gcs.rst
@@ -31,7 +31,7 @@ Please follow Azure
 to do it.
 
 TOKEN should be added to the Connection in Airflow in JSON format, Login and Password as plain text.
-You can check `how to do such connection <https://airflow.apache.org/docs/stable/howto/connection/index.html#editing-a-connection-with-the-ui>`_.
+You can check `how to do such connection <https://airflow.apache.org/docs/apache-airflow/stable/howto/connection/index.html#editing-a-connection-with-the-ui>`_.
 
 See following example.
 Set values for these fields:

[airflow] 35/36: Fix missing on_load trigger for folder-based plugins (#15208)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 683005a390cac6957e69f792b210374818d8cee1
Author: Jed Cunningham <66...@users.noreply.github.com>
AuthorDate: Tue Apr 6 15:48:12 2021 -0600

    Fix missing on_load trigger for folder-based plugins (#15208)
    
    (cherry picked from commit 97b7780df48b412e104ff4adeecbe715264f00eb)
---
 airflow/plugins_manager.py            | 23 +++++++++-------
 tests/plugins/test_plugin.py          |  7 +++++
 tests/plugins/test_plugins_manager.py | 49 +++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/airflow/plugins_manager.py b/airflow/plugins_manager.py
index b68dbb9..cf957ff 100644
--- a/airflow/plugins_manager.py
+++ b/airflow/plugins_manager.py
@@ -173,13 +173,23 @@ def is_valid_plugin(plugin_obj):
     return False
 
 
+def register_plugin(plugin_instance):
+    """
+    Start plugin load and register it after success initialization
+
+    :param plugin_instance: subclass of AirflowPlugin
+    """
+    global plugins  # pylint: disable=global-statement
+    plugin_instance.on_load()
+    plugins.append(plugin_instance)
+
+
 def load_entrypoint_plugins():
     """
     Load and register plugins AirflowPlugin subclasses from the entrypoints.
     The entry_point group should be 'airflow.plugins'.
     """
     global import_errors  # pylint: disable=global-statement
-    global plugins  # pylint: disable=global-statement
 
     log.debug("Loading plugins from entrypoints")
 
@@ -191,10 +201,8 @@ def load_entrypoint_plugins():
                 continue
 
             plugin_instance = plugin_class()
-            if callable(getattr(plugin_instance, 'on_load', None)):
-                plugin_instance.on_load()
-                plugin_instance.source = EntryPointSource(entry_point, dist)
-                plugins.append(plugin_instance)
+            plugin_instance.source = EntryPointSource(entry_point, dist)
+            register_plugin(plugin_instance)
         except Exception as e:  # pylint: disable=broad-except
             log.exception("Failed to import plugin %s", entry_point.name)
             import_errors[entry_point.module] = str(e)
@@ -203,11 +211,9 @@ def load_entrypoint_plugins():
 def load_plugins_from_plugin_directory():
     """Load and register Airflow Plugins from plugins directory"""
     global import_errors  # pylint: disable=global-statement
-    global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
     for file_path in find_path_from_directory(settings.PLUGINS_FOLDER, ".airflowignore"):
-
         if not os.path.isfile(file_path):
             continue
         mod_name, file_ext = os.path.splitext(os.path.split(file_path)[-1])
@@ -225,8 +231,7 @@ def load_plugins_from_plugin_directory():
             for mod_attr_value in (m for m in mod.__dict__.values() if is_valid_plugin(m)):
                 plugin_instance = mod_attr_value()
                 plugin_instance.source = PluginsDirectorySource(file_path)
-                plugins.append(plugin_instance)
-
+                register_plugin(plugin_instance)
         except Exception as e:  # pylint: disable=broad-except
             log.exception(e)
             log.error('Failed to import plugin %s', file_path)
diff --git a/tests/plugins/test_plugin.py b/tests/plugins/test_plugin.py
index d52d8e5..ca02a39 100644
--- a/tests/plugins/test_plugin.py
+++ b/tests/plugins/test_plugin.py
@@ -127,3 +127,10 @@ class MockPluginB(AirflowPlugin):
 
 class MockPluginC(AirflowPlugin):
     name = 'plugin-c'
+
+
+class AirflowTestOnLoadPlugin(AirflowPlugin):
+    name = 'preload'
+
+    def on_load(self, *args, **kwargs):
+        self.name = 'postload'
diff --git a/tests/plugins/test_plugins_manager.py b/tests/plugins/test_plugins_manager.py
index f730f17..7c4d86a 100644
--- a/tests/plugins/test_plugins_manager.py
+++ b/tests/plugins/test_plugins_manager.py
@@ -17,18 +17,33 @@
 # under the License.
 import importlib
 import logging
+import os
 import sys
+import tempfile
 import unittest
 from unittest import mock
 
+import pytest
+
 from airflow.hooks.base import BaseHook
 from airflow.plugins_manager import AirflowPlugin
 from airflow.www import app as application
+from tests.test_utils.config import conf_vars
 from tests.test_utils.mock_plugins import mock_plugin_manager
 
 py39 = sys.version_info >= (3, 9)
 importlib_metadata = 'importlib.metadata' if py39 else 'importlib_metadata'
 
+ON_LOAD_EXCEPTION_PLUGIN = """
+from airflow.plugins_manager import AirflowPlugin
+
+class AirflowTestOnLoadExceptionPlugin(AirflowPlugin):
+    name = 'preload'
+
+    def on_load(self, *args, **kwargs):
+        raise Exception("oops")
+"""
+
 
 class TestPluginsRBAC(unittest.TestCase):
     def setUp(self):
@@ -145,6 +160,40 @@ class TestPluginsManager:
         assert caplog.records[-1].levelname == 'DEBUG'
         assert caplog.records[-1].msg == 'Loading %d plugin(s) took %.2f seconds'
 
+    def test_loads_filesystem_plugins(self, caplog):
+        from airflow import plugins_manager
+
+        with mock.patch('airflow.plugins_manager.plugins', []):
+            plugins_manager.load_plugins_from_plugin_directory()
+
+            assert 5 == len(plugins_manager.plugins)
+            for plugin in plugins_manager.plugins:
+                if 'AirflowTestOnLoadPlugin' not in str(plugin):
+                    continue
+                assert 'postload' == plugin.name
+                break
+            else:
+                pytest.fail("Wasn't able to find a registered `AirflowTestOnLoadPlugin`")
+
+            assert caplog.record_tuples == []
+
+    def test_loads_filesystem_plugins_exception(self, caplog):
+        from airflow import plugins_manager
+
+        with mock.patch('airflow.plugins_manager.plugins', []):
+            with tempfile.TemporaryDirectory() as tmpdir:
+                with open(os.path.join(tmpdir, 'testplugin.py'), "w") as f:
+                    f.write(ON_LOAD_EXCEPTION_PLUGIN)
+
+                with conf_vars({('core', 'plugins_folder'): tmpdir}):
+                    plugins_manager.load_plugins_from_plugin_directory()
+
+            assert plugins_manager.plugins == []
+
+            received_logs = caplog.text
+            assert 'Failed to import plugin' in received_logs
+            assert 'testplugin.py' in received_logs
+
     def test_should_warning_about_incompatible_plugins(self, caplog):
         class AirflowAdminViewsPlugin(AirflowPlugin):
             name = "test_admin_views_plugin"

[airflow] 15/36: Bugfix: resources in `executor_config` breaks Graph View in UI (#15199)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit b6229157e97f2614754e1aa2db3238c7d2472f22
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Mon Apr 5 12:41:29 2021 +0100

    Bugfix: resources in `executor_config` breaks Graph View in UI (#15199)
    
    closes https://github.com/apache/airflow/issues/14327
    
    When using `KubernetesExecutor` and the task as follows:
    
    ```python
    PythonOperator(
        task_id=f"sync_{table_name}",
        python_callable=sync_table,
        provide_context=True,
        op_kwargs={"table_name": table_name},
        executor_config={"KubernetesExecutor": {"request_cpu": "1"}},
        retries=5,
        dag=dag,
    )
    ```
    
    it breaks the UI as settings resources in such a way is only there
    for backwards compatibility.
    
    This commits fixes it.
    
    (cherry picked from commit 7b577c35e241182f3f3473ca02da197f1b5f7437)
---
 airflow/utils/json.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/airflow/utils/json.py b/airflow/utils/json.py
index 45dda75..8e22408 100644
--- a/airflow/utils/json.py
+++ b/airflow/utils/json.py
@@ -66,7 +66,7 @@ class AirflowJsonEncoder(JSONEncoder):
             obj, (np.float_, np.float16, np.float32, np.float64, np.complex_, np.complex64, np.complex128)
         ):
             return float(obj)
-        elif k8s is not None and isinstance(obj, k8s.V1Pod):
+        elif k8s is not None and isinstance(obj, (k8s.V1Pod, k8s.V1ResourceRequirements)):
             from airflow.kubernetes.pod_generator import PodGenerator
 
             return PodGenerator.serialize_pod(obj)

[airflow] 36/36: Import Connection lazily in hooks to avoid cycles (#15361)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit f769e810a39ccaea0f8fc1b12986e5bcb02d6749
Author: Tzu-ping Chung <tp...@astronomer.io>
AuthorDate: Wed Apr 14 21:33:00 2021 +0800

    Import Connection lazily in hooks to avoid cycles (#15361)
    
    The current implementation imports Connection on import time, which
    causes a circular import when a model class needs to reference a hook
    class.
    
    By applying this fix, the airflow.hooks package is completely decoupled
    with airflow.models on import time, so model code can reference hooks.
    Hooks, on the other hand, generally don't reference model classes.
    
    Fix #15325.
    
    (cherry picked from commit 75603160848e4199ed368809dfd441dcc5ddbd82)
---
 airflow/hooks/base.py | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/airflow/hooks/base.py b/airflow/hooks/base.py
index b3c0c11..dee76dc 100644
--- a/airflow/hooks/base.py
+++ b/airflow/hooks/base.py
@@ -18,12 +18,14 @@
 """Base class for all hooks"""
 import logging
 import warnings
-from typing import Any, Dict, List
+from typing import TYPE_CHECKING, Any, Dict, List
 
-from airflow.models.connection import Connection
 from airflow.typing_compat import Protocol
 from airflow.utils.log.logging_mixin import LoggingMixin
 
+if TYPE_CHECKING:
+    from airflow.models.connection import Connection  # Avoid circular imports.
+
 log = logging.getLogger(__name__)
 
 
@@ -37,7 +39,7 @@ class BaseHook(LoggingMixin):
     """
 
     @classmethod
-    def get_connections(cls, conn_id: str) -> List[Connection]:
+    def get_connections(cls, conn_id: str) -> List["Connection"]:
         """
         Get all connections as an iterable, given the connection id.
 
@@ -53,13 +55,15 @@ class BaseHook(LoggingMixin):
         return [cls.get_connection(conn_id)]
 
     @classmethod
-    def get_connection(cls, conn_id: str) -> Connection:
+    def get_connection(cls, conn_id: str) -> "Connection":
         """
         Get connection, given connection id.
 
         :param conn_id: connection id
         :return: connection
         """
+        from airflow.models.connection import Connection
+
         conn = Connection.get_connection_from_secrets(conn_id)
         if conn.host:
             log.info(

[airflow] 03/36: Fixes problem when Pull Request is `weird` - has null head_repo (#15189)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 6f4e134128db744a8f3001c7e4e0b0f56d05f308
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Sun Apr 4 20:30:02 2021 +0200

    Fixes problem when Pull Request is `weird` - has null head_repo (#15189)
    
    Fixes: #15188
    (cherry picked from commit 041a09f3ee6bc447c3457b108bd5431a2fd70ad9)
---
 .github/actions/cancel-workflow-runs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/actions/cancel-workflow-runs b/.github/actions/cancel-workflow-runs
index 953e057..8248bc1 160000
--- a/.github/actions/cancel-workflow-runs
+++ b/.github/actions/cancel-workflow-runs
@@ -1 +1 @@
-Subproject commit 953e057dc81d3458935a18d1184c386b0f6b5738
+Subproject commit 8248bc1feff049e98c0e6a96889b147199c38203

[airflow] 05/36: Merges quarantined tests into single job (#15153)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit ca9dba010df57fa638e501e957711c443fedcd6c
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Mon Apr 5 19:58:10 2021 +0200

    Merges quarantined tests into single job (#15153)
    
    (cherry picked from commit 1087226f756b3ff9ea48398e53f9074b0ed4c1cc)
---
 .github/workflows/ci.yml                           |   9 +-
 scripts/ci/libraries/_all_libs.sh                  |   2 +
 scripts/ci/libraries/_initialization.sh            |   3 +-
 scripts/ci/libraries/_parallel.sh                  |  35 +++++-
 scripts/ci/libraries/_testing.sh                   | 116 +++++++++++++++++
 scripts/ci/testing/ci_run_airflow_testing.sh       | 140 +++------------------
 scripts/ci/testing/ci_run_quarantined_tests.sh     |  87 +++++++++++++
 .../ci_run_single_airflow_test_in_docker.sh        |   6 +-
 8 files changed, 259 insertions(+), 139 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index ddc985b..dc98f5c 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -817,15 +817,8 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
     runs-on: ${{ fromJson(needs.build-info.outputs.runsOn) }}
     continue-on-error: true
     needs: [build-info, ci-images]
-    strategy:
-      matrix:
-        include:
-          - backend: mysql
-          - backend: postgres
-          - backend: sqlite
     env:
       RUNS_ON: ${{ fromJson(needs.build-info.outputs.runsOn) }}
-      BACKEND: ${{ matrix.backend }}
       PYTHON_MAJOR_MINOR_VERSION: ${{ needs.build-info.outputs.defaultPythonVersion }}
       MYSQL_VERSION: ${{needs.build-info.outputs.defaultMySQLVersion}}
       POSTGRES_VERSION: ${{needs.build-info.outputs.defaultPostgresVersion}}
@@ -860,7 +853,7 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
       - name: "Prepare CI image ${{env.PYTHON_MAJOR_MINOR_VERSION}}:${{ env.GITHUB_REGISTRY_PULL_IMAGE_TAG }}"
         run: ./scripts/ci/images/ci_prepare_ci_image_on_ci.sh
       - name: "Tests: Quarantined"
-        run: ./scripts/ci/testing/ci_run_airflow_testing.sh
+        run: ./scripts/ci/testing/ci_run_quarantined_tests.sh
       - name: "Upload Quarantine test results"
         uses: actions/upload-artifact@v2
         if: always()
diff --git a/scripts/ci/libraries/_all_libs.sh b/scripts/ci/libraries/_all_libs.sh
index 09a147d..04e25e8 100755
--- a/scripts/ci/libraries/_all_libs.sh
+++ b/scripts/ci/libraries/_all_libs.sh
@@ -60,6 +60,8 @@ readonly SCRIPTS_CI_DIR
 . "${LIBRARIES_DIR}"/_spinner.sh
 # shellcheck source=scripts/ci/libraries/_start_end.sh
 . "${LIBRARIES_DIR}"/_start_end.sh
+# shellcheck source=scripts/ci/libraries/_testing.sh
+. "${LIBRARIES_DIR}"/_testing.sh
 # shellcheck source=scripts/ci/libraries/_verbosity.sh
 . "${LIBRARIES_DIR}"/_verbosity.sh
 # shellcheck source=scripts/ci/libraries/_verify_image.sh
diff --git a/scripts/ci/libraries/_initialization.sh b/scripts/ci/libraries/_initialization.sh
index f924962..f82cb55 100644
--- a/scripts/ci/libraries/_initialization.sh
+++ b/scripts/ci/libraries/_initialization.sh
@@ -710,7 +710,7 @@ Initialization variables:
 
 Test variables:
 
-    TEST_TYPE: '${TEST_TYPE}'
+    TEST_TYPE: '${TEST_TYPE=}'
 
 EOF
     if [[ "${CI}" == "true" ]]; then
@@ -776,7 +776,6 @@ function initialization::make_constants_read_only() {
     readonly HELM_VERSION
     readonly KUBECTL_VERSION
 
-    readonly BACKEND
     readonly POSTGRES_VERSION
     readonly MYSQL_VERSION
 
diff --git a/scripts/ci/libraries/_parallel.sh b/scripts/ci/libraries/_parallel.sh
index dfe1c4d..739bae1 100644
--- a/scripts/ci/libraries/_parallel.sh
+++ b/scripts/ci/libraries/_parallel.sh
@@ -73,7 +73,7 @@ function parallel::monitor_loop() {
         do
             parallel_process=$(basename "${directory}")
 
-            echo "${COLOR_BLUE}### The last lines for ${parallel_process} process ###${COLOR_RESET}"
+            echo "${COLOR_BLUE}### The last lines for ${parallel_process} process: ${directory}/stdout ###${COLOR_RESET}"
             echo
             tail -2 "${directory}/stdout" || true
             echo
@@ -160,3 +160,36 @@ function parallel::print_job_summary_and_return_status_code() {
     done
     return "${return_code}"
 }
+
+function parallel::kill_all_running_docker_containers() {
+    echo
+    echo "${COLOR_BLUE}Kill all running docker containers${COLOR_RESET}"
+    echo
+    # shellcheck disable=SC2046
+    docker kill $(docker ps -q) || true
+}
+
+function parallel::system_prune_docker() {
+    echo
+    echo "${COLOR_BLUE}System-prune docker${COLOR_RESET}"
+    echo
+    docker_v system prune --force --volumes
+    echo
+}
+
+# Cleans up runner before test execution.
+#  * Kills all running docker containers
+#  * System prune to clean all the temporary/unnamed images and left-over volumes
+#  * Print information about available space and memory
+#  * Kills stale semaphore locks
+function parallel::cleanup_runner() {
+    start_end::group_start "Cleanup runner"
+    parallel::kill_all_running_docker_containers
+    parallel::system_prune_docker
+    docker_engine_resources::get_available_memory_in_docker
+    docker_engine_resources::get_available_cpus_in_docker
+    docker_engine_resources::get_available_disk_space_in_docker
+    docker_engine_resources::print_overall_stats
+    parallel::kill_stale_semaphore_locks
+    start_end::group_end
+}
diff --git a/scripts/ci/libraries/_testing.sh b/scripts/ci/libraries/_testing.sh
new file mode 100644
index 0000000..28d1fc6
--- /dev/null
+++ b/scripts/ci/libraries/_testing.sh
@@ -0,0 +1,116 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+export MEMORY_REQUIRED_FOR_INTEGRATION_TEST_PARALLEL_RUN=33000
+
+function testing::skip_tests_if_requested(){
+    if [[ -f ${BUILD_CACHE_DIR}/.skip_tests ]]; then
+        echo
+        echo "Skipping running tests !!!!!"
+        echo
+        exit
+    fi
+}
+
+function testing::get_docker_compose_local() {
+    DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/files.yml")
+    if [[ ${MOUNT_SELECTED_LOCAL_SOURCES} == "true" ]]; then
+        DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/local.yml")
+    fi
+    if [[ ${MOUNT_ALL_LOCAL_SOURCES} == "true" ]]; then
+        DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/local-all-sources.yml")
+    fi
+
+    if [[ ${GITHUB_ACTIONS} == "true" ]]; then
+        DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/ga.yml")
+    fi
+
+    if [[ ${FORWARD_CREDENTIALS} == "true" ]]; then
+        DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/forward-credentials.yml")
+    fi
+
+    if [[ -n ${INSTALL_AIRFLOW_VERSION=} || -n ${INSTALL_AIRFLOW_REFERENCE} ]]; then
+        DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/remove-sources.yml")
+    fi
+    readonly DOCKER_COMPOSE_LOCAL
+}
+
+function testing::get_maximum_parallel_test_jobs() {
+    docker_engine_resources::get_available_cpus_in_docker
+    if [[ ${RUNS_ON} != *"self-hosted"* ]]; then
+        echo
+        echo "${COLOR_YELLOW}This is a Github Public runner - for now we are forcing max parallel Quarantined tests jobs to 1 for those${COLOR_RESET}"
+        echo
+        export MAX_PARALLEL_QUARANTINED_TEST_JOBS="1"
+    else
+        if [[ ${MAX_PARALLEL_QUARANTINED_TEST_JOBS=} != "" ]]; then
+            echo
+            echo "${COLOR_YELLOW}Maximum parallel Quarantined test jobs forced via MAX_PARALLEL_QUARANTINED_TEST_JOBS = ${MAX_PARALLEL_QUARANTINED_TEST_JOBS}${COLOR_RESET}"
+            echo
+        else
+            MAX_PARALLEL_QUARANTINED_TEST_JOBS=${CPUS_AVAILABLE_FOR_DOCKER}
+            echo
+            echo "${COLOR_YELLOW}Maximum parallel Quarantined test jobs set to number of CPUs available for Docker = ${MAX_PARALLEL_QUARANTINED_TEST_JOBS}${COLOR_RESET}"
+            echo
+        fi
+
+    fi
+
+    if [[ ${MAX_PARALLEL_TEST_JOBS=} != "" ]]; then
+        echo
+        echo "${COLOR_YELLOW}Maximum parallel test jobs forced via MAX_PARALLEL_TEST_JOBS = ${MAX_PARALLEL_TEST_JOBS}${COLOR_RESET}"
+        echo
+    else
+        MAX_PARALLEL_TEST_JOBS=${CPUS_AVAILABLE_FOR_DOCKER}
+        echo
+        echo "${COLOR_YELLOW}Maximum parallel test jobs set to number of CPUs available for Docker = ${MAX_PARALLEL_TEST_JOBS}${COLOR_RESET}"
+        echo
+    fi
+    export MAX_PARALLEL_TEST_JOBS
+}
+
+function testing::get_test_types_to_run() {
+    if [[ -n "${FORCE_TEST_TYPE=}" ]]; then
+        # Handle case where test type is forced from outside
+        export TEST_TYPES="${FORCE_TEST_TYPE}"
+    fi
+
+    if [[ -z "${TEST_TYPES=}" ]]; then
+        TEST_TYPES="Core Providers API CLI Integration Other WWW"
+        echo
+        echo "Test types not specified. Adding all: ${TEST_TYPES}"
+        echo
+    fi
+
+    if [[ -z "${FORCE_TEST_TYPE=}" ]]; then
+        # Add Postgres/MySQL special test types in case we are running several test types
+        if [[ ${BACKEND} == "postgres" && ${TEST_TYPES} != "Quarantined" ]]; then
+            TEST_TYPES="${TEST_TYPES} Postgres"
+            echo
+            echo "Added Postgres. Tests to run: ${TEST_TYPES}"
+            echo
+        fi
+        if [[ ${BACKEND} == "mysql" && ${TEST_TYPES} != "Quarantined" ]]; then
+            TEST_TYPES="${TEST_TYPES} MySQL"
+            echo
+            echo "Added MySQL. Tests to run: ${TEST_TYPES}"
+            echo
+        fi
+    fi
+    readonly TEST_TYPES
+}
diff --git a/scripts/ci/testing/ci_run_airflow_testing.sh b/scripts/ci/testing/ci_run_airflow_testing.sh
index af147ad..fa8c044 100755
--- a/scripts/ci/testing/ci_run_airflow_testing.sh
+++ b/scripts/ci/testing/ci_run_airflow_testing.sh
@@ -23,128 +23,13 @@ export RUN_TESTS
 SKIPPED_FAILED_JOB="Quarantined"
 export SKIPPED_FAILED_JOB
 
-# shellcheck source=scripts/ci/libraries/_script_init.sh
-. "$( dirname "${BASH_SOURCE[0]}" )/../libraries/_script_init.sh"
-
-if [[ -f ${BUILD_CACHE_DIR}/.skip_tests ]]; then
-    echo
-    echo "Skipping running tests !!!!!"
-    echo
-    exit
-fi
-
-# In case we see too many failures on regular PRs from our users using GitHub Public runners
-# We can uncomment this and come back to sequential test-type execution
-#if [[ ${RUNS_ON} != *"self-hosted"* ]]; then
-#    echo
-#    echo "${COLOR_YELLOW}This is a Github Public runner - for now we are forcing max parallel jobs to 1 for those${COLOR_RESET}"
-#    echo "${COLOR_YELLOW}Until we fix memory usage to allow up to 2 parallel runs on those runners${COLOR_RESET}"
-#    echo
-#    # Forces testing in parallel in case the script is run on self-hosted runners
-#    export MAX_PARALLEL_TEST_JOBS="1"
-#fi
-
 SEMAPHORE_NAME="tests"
+export SEMAPHORE_NAME
 
-function prepare_tests_to_run() {
-    DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/files.yml")
-    if [[ ${MOUNT_SELECTED_LOCAL_SOURCES} == "true" ]]; then
-        DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/local.yml")
-    fi
-    if [[ ${MOUNT_ALL_LOCAL_SOURCES} == "true" ]]; then
-        DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/local-all-sources.yml")
-    fi
-
-    if [[ ${GITHUB_ACTIONS} == "true" ]]; then
-        DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/ga.yml")
-    fi
-
-    if [[ ${FORWARD_CREDENTIALS} == "true" ]]; then
-        DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/forward-credentials.yml")
-    fi
-
-    if [[ -n ${INSTALL_AIRFLOW_VERSION=} || -n ${INSTALL_AIRFLOW_REFERENCE} ]]; then
-        DOCKER_COMPOSE_LOCAL+=("-f" "${SCRIPTS_CI_DIR}/docker-compose/remove-sources.yml")
-    fi
-    readonly DOCKER_COMPOSE_LOCAL
-
-    if [[ -n "${FORCE_TEST_TYPE=}" ]]; then
-        # Handle case where test type is forced from outside
-        export TEST_TYPES="${FORCE_TEST_TYPE}"
-    fi
-
-    if [[ -z "${TEST_TYPES=}" ]]; then
-        TEST_TYPES="Core Providers API CLI Integration Other WWW"
-        echo
-        echo "Test types not specified. Adding all: ${TEST_TYPES}"
-        echo
-    fi
-
-    if [[ -z "${FORCE_TEST_TYPE=}" ]]; then
-        # Add Postgres/MySQL special test types in case we are running several test types
-        if [[ ${BACKEND} == "postgres" && ${TEST_TYPES} != "Quarantined" ]]; then
-            TEST_TYPES="${TEST_TYPES} Postgres"
-            echo
-            echo "Added Postgres. Tests to run: ${TEST_TYPES}"
-            echo
-        fi
-        if [[ ${BACKEND} == "mysql" && ${TEST_TYPES} != "Quarantined" ]]; then
-            TEST_TYPES="${TEST_TYPES} MySQL"
-            echo
-            echo "Added MySQL. Tests to run: ${TEST_TYPES}"
-            echo
-        fi
-    fi
-    readonly TEST_TYPES
-}
-
-function kill_all_running_docker_containers() {
-    echo
-    echo "${COLOR_BLUE}Kill all running docker containers${COLOR_RESET}"
-    echo
-    # shellcheck disable=SC2046
-    docker kill $(docker ps -q) || true
-}
+# shellcheck source=scripts/ci/libraries/_script_init.sh
+. "$( dirname "${BASH_SOURCE[0]}" )/../libraries/_script_init.sh"
 
-function system_prune_docker() {
-    echo
-    echo "${COLOR_BLUE}System-prune docker${COLOR_RESET}"
-    echo
-    docker_v system prune --force --volumes
-    echo
-}
 
-function get_maximum_parallel_test_jobs() {
-    if [[ ${MAX_PARALLEL_TEST_JOBS=} != "" ]]; then
-        echo
-        echo "${COLOR_YELLOW}Maximum parallel test jobs forced vi MAX_PARALLEL_TEST_JOBS = ${MAX_PARALLEL_TEST_JOBS}${COLOR_RESET}"
-        echo
-    else
-        MAX_PARALLEL_TEST_JOBS=${CPUS_AVAILABLE_FOR_DOCKER}
-        echo
-        echo "${COLOR_YELLOW}Maximum parallel test jobs set to number of CPUs available for Docker = ${MAX_PARALLEL_TEST_JOBS}${COLOR_RESET}"
-        echo
-    fi
-    export MAX_PARALLEL_TEST_JOBS
-}
-
-# Cleans up runner before test execution.
-#  * Kills all running docker containers
-#  * System prune to clean all the temporary/unnamed images and left-over volumes
-#  * Print information about available space and memory
-#  * Kills stale semaphore locks
-function cleanup_runner() {
-    start_end::group_start "Cleanup runner"
-    kill_all_running_docker_containers
-    system_prune_docker
-    docker_engine_resources::get_available_memory_in_docker
-    docker_engine_resources::get_available_cpus_in_docker
-    docker_engine_resources::get_available_disk_space_in_docker
-    docker_engine_resources::print_overall_stats
-    get_maximum_parallel_test_jobs
-    parallel::kill_stale_semaphore_locks
-    start_end::group_end
-}
 
 # Starts test types in parallel
 # test_types_to_run - list of test types (it's not an array, it is space-separate list)
@@ -171,9 +56,6 @@ function run_test_types_in_parallel() {
     start_end::group_end
 }
 
-
-export MEMORY_REQUIRED_FOR_INTEGRATION_TEST_PARALLEL_RUN=33000
-
 # Runs all test types in parallel depending on the number of CPUs available
 # We monitors their progress, display the progress  and summarize the result when finished.
 #
@@ -188,7 +70,7 @@ export MEMORY_REQUIRED_FOR_INTEGRATION_TEST_PARALLEL_RUN=33000
 #   * MEMORY_AVAILABLE_FOR_DOCKER - memory that is available in docker (set by cleanup_runners)
 #
 function run_all_test_types_in_parallel() {
-    cleanup_runner
+    parallel::cleanup_runner
 
     start_end::group_start "Determine how to run the tests"
     echo
@@ -196,6 +78,7 @@ function run_all_test_types_in_parallel() {
     echo
 
     local run_integration_tests_separately="false"
+    # shellcheck disable=SC2153
     local test_types_to_run=${TEST_TYPES}
 
     if [[ ${test_types_to_run} == *"Integration"* ]]; then
@@ -222,7 +105,7 @@ function run_all_test_types_in_parallel() {
 
     run_test_types_in_parallel "${@}"
     if [[ ${run_integration_tests_separately} == "true" ]]; then
-        cleanup_runner
+        parallel::cleanup_runner
         test_types_to_run="Integration"
         run_test_types_in_parallel "${@}"
     fi
@@ -231,12 +114,19 @@ function run_all_test_types_in_parallel() {
     parallel::print_job_summary_and_return_status_code
 }
 
+
+testing::skip_tests_if_requested
+
 build_images::prepare_ci_build
 
 build_images::rebuild_ci_image_if_needed_with_group
 
-prepare_tests_to_run
-
 parallel::make_sure_gnu_parallel_is_installed
 
+testing::get_maximum_parallel_test_jobs
+
+testing::get_test_types_to_run
+
+testing::get_docker_compose_local
+
 run_all_test_types_in_parallel "${@}"
diff --git a/scripts/ci/testing/ci_run_quarantined_tests.sh b/scripts/ci/testing/ci_run_quarantined_tests.sh
new file mode 100755
index 0000000..0c1108e
--- /dev/null
+++ b/scripts/ci/testing/ci_run_quarantined_tests.sh
@@ -0,0 +1,87 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Enable automated tests execution
+RUN_TESTS="true"
+export RUN_TESTS
+
+SKIPPED_FAILED_JOB="Quarantined"
+export SKIPPED_FAILED_JOB
+
+SEMAPHORE_NAME="tests"
+export SEMAPHORE_NAME
+
+# shellcheck source=scripts/ci/libraries/_script_init.sh
+. "$( dirname "${BASH_SOURCE[0]}" )/../libraries/_script_init.sh"
+
+BACKEND_TEST_TYPES=(mysql postgres sqlite)
+
+# Starts test types in parallel
+# test_types_to_run - list of test types (it's not an array, it is space-separate list)
+# ${@} - additional arguments to pass to test execution
+function run_quarantined_backend_tests_in_parallel() {
+    start_end::group_start "Determining how to run the tests"
+    echo
+    echo "${COLOR_YELLOW}Running maximum ${MAX_PARALLEL_QUARANTINED_TEST_JOBS} test types in parallel${COLOR_RESET}"
+    echo
+    start_end::group_end
+    start_end::group_start "Monitoring Quarantined tests : ${BACKEND_TEST_TYPES[*]}"
+    parallel::initialize_monitoring
+    parallel::monitor_progress
+    mkdir -p "${PARALLEL_MONITORED_DIR}/${SEMAPHORE_NAME}"
+    TEST_TYPE="Quarantined"
+    export TEST_TYPE
+    for BACKEND in "${BACKEND_TEST_TYPES[@]}"
+    do
+        export BACKEND
+        mkdir -p "${PARALLEL_MONITORED_DIR}/${SEMAPHORE_NAME}/${BACKEND}"
+        mkdir -p "${PARALLEL_MONITORED_DIR}/${SEMAPHORE_NAME}/${BACKEND}"
+        export JOB_LOG="${PARALLEL_MONITORED_DIR}/${SEMAPHORE_NAME}/${BACKEND}/stdout"
+        export PARALLEL_JOB_STATUS="${PARALLEL_MONITORED_DIR}/${SEMAPHORE_NAME}/${BACKEND}/status"
+        # Each test job will get SIGTERM followed by SIGTERM 200ms later and SIGKILL 200ms later after 25 mins
+        # shellcheck disable=SC2086
+        parallel --ungroup --bg --semaphore --semaphorename "${SEMAPHORE_NAME}" \
+            --jobs "${MAX_PARALLEL_QUARANTINED_TEST_JOBS}" --timeout 1500 \
+            "$( dirname "${BASH_SOURCE[0]}" )/ci_run_single_airflow_test_in_docker.sh" "${@}" >${JOB_LOG} 2>&1
+    done
+    parallel --semaphore --semaphorename "${SEMAPHORE_NAME}" --wait
+    parallel::kill_monitor
+    start_end::group_end
+}
+
+testing::skip_tests_if_requested
+
+build_images::prepare_ci_build
+
+build_images::rebuild_ci_image_if_needed_with_group
+
+parallel::make_sure_gnu_parallel_is_installed
+
+testing::get_maximum_parallel_test_jobs
+
+testing::get_docker_compose_local
+
+run_quarantined_backend_tests_in_parallel "${@}"
+
+set +e
+
+parallel::print_job_summary_and_return_status_code
+
+echo "Those are quarantined tests so failure of those does not fail the whole build!"
+echo "Please look above for the output of failed tests to fix them!"
+echo
diff --git a/scripts/ci/testing/ci_run_single_airflow_test_in_docker.sh b/scripts/ci/testing/ci_run_single_airflow_test_in_docker.sh
index 76b710e..0bf415f 100755
--- a/scripts/ci/testing/ci_run_single_airflow_test_in_docker.sh
+++ b/scripts/ci/testing/ci_run_single_airflow_test_in_docker.sh
@@ -90,7 +90,7 @@ function run_airflow_testing_in_docker() {
         echo "Making sure docker-compose is down and remnants removed"
         echo
         docker-compose --log-level INFO -f "${SCRIPTS_CI_DIR}/docker-compose/base.yml" \
-            --project-name "airflow-${TEST_TYPE}" \
+            --project-name "airflow-${TEST_TYPE}-${BACKEND}" \
             down --remove-orphans \
             --volumes --timeout 10
         docker-compose --log-level INFO \
@@ -98,11 +98,11 @@ function run_airflow_testing_in_docker() {
           -f "${SCRIPTS_CI_DIR}/docker-compose/backend-${BACKEND}.yml" \
           "${INTEGRATIONS[@]}" \
           "${DOCKER_COMPOSE_LOCAL[@]}" \
-          --project-name "airflow-${TEST_TYPE}" \
+          --project-name "airflow-${TEST_TYPE}-${BACKEND}" \
              run airflow "${@}"
         exit_code=$?
         docker-compose --log-level INFO -f "${SCRIPTS_CI_DIR}/docker-compose/base.yml" \
-            --project-name "airflow-${TEST_TYPE}" \
+            --project-name "airflow-${TEST_TYPE}-${BACKEND}" \
             down --remove-orphans \
             --volumes --timeout 10
         if [[ ${exit_code} == "254" && ${try_num} != "5" ]]; then

[airflow] 25/36: Update import path and fix typo in `dag-run.rst` (#15201)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 863250d2965bcaaaf73a5e821f86a7332a5e8bda
Author: eladkal <45...@users.noreply.github.com>
AuthorDate: Mon Apr 5 14:46:58 2021 +0300

    Update import path and fix typo in `dag-run.rst` (#15201)
    
    1. fix typo parametrized ->  parameterized
    2. update `from airflow.operators.bash_operator import BashOperator` -> `from airflow.operators.bash import BashOperator`
    
    (cherry picked from commit 4099108f554130cf3f87ba33b9d6084a74e70231)
---
 docs/apache-airflow/dag-run.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/apache-airflow/dag-run.rst b/docs/apache-airflow/dag-run.rst
index 72204f1..dbcf68a 100644
--- a/docs/apache-airflow/dag-run.rst
+++ b/docs/apache-airflow/dag-run.rst
@@ -208,10 +208,10 @@ Example of a parameterized DAG:
 .. code-block:: python
 
     from airflow import DAG
-    from airflow.operators.bash_operator import BashOperator
+    from airflow.operators.bash import BashOperator
     from airflow.utils.dates import days_ago
 
-    dag = DAG("example_parametrized_dag", schedule_interval=None, start_date=days_ago(2))
+    dag = DAG("example_parameterized_dag", schedule_interval=None, start_date=days_ago(2))
 
     parameterized_task = BashOperator(
         task_id='parameterized_task',
@@ -227,7 +227,7 @@ Using CLI
 
 .. code-block:: bash
 
-    airflow dags trigger --conf '{"conf1": "value1"}' example_parametrized_dag
+    airflow dags trigger --conf '{"conf1": "value1"}' example_parameterized_dag
 
 Using UI
 ^^^^^^^^^^

[airflow] 11/36: Adds 'Trino' provider (with lower memory footprint for tests) (#15187)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 0558900166e1fa0a6f3d94c247173975ed8f23ba
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Tue Apr 6 19:19:19 2021 +0200

    Adds 'Trino' provider (with lower memory footprint for tests) (#15187)
    
    While checking the test status of various CI tests we came to
    conclusion that Presto integration took a lot of memory (~1GB)
    and was the main source of failures during integration tests,
    especially with MySQL8. The attempt to fine-tune the memory
    used turned out in the discovery, that Presto DB stopped
    publishing their Docker image (prestosql/presto) - apparently
    after the aftermath of splitting-off Trino from Presto.
    
    Th split-off was already discussed in #14281 and it was planned
    to add support for Trino (which is the more community-driven
    fork of the Presto - Presto remained at Facebook Governance,
    where Trino is an effort continued by the original creators.
    
    You can read more about it in the announcement:
    https://trino.io/blog/2020/12/27/announcing-trino.html. While
    Presto continues their way under The Linux Foundation, Trino
    lives its own live and keeps on maintaining all artifacts and
    libraries (including the image). That allowed us to update
    our tests and decrease the memory footprint by around 400MB.
    
    This commit:
    
    * adds the new Trino provider
    * removes `presto` integration and replaces it with `trino`
    * the `trino` integartion image is built with 400MB less memory
      requirementes and published as `apache/airflow:trino-*`
    * moves the integration tests from Presto to Trino
    
    Fixes: #14281
    (cherry picked from commit eae22cec9c87e8dad4d6e8599e45af1bdd452062)
---
 BREEZE.rst                                         |   2 +-
 CONTRIBUTING.rst                                   |  47 +--
 IMAGES.rst                                         |   4 +-
 INSTALL                                            |   4 +-
 TESTING.rst                                        |   4 +-
 airflow/providers/dependencies.json                |   4 +-
 .../cloud/example_dags/example_trino_to_gcs.py     | 150 ++++++++++
 .../google/cloud/transfers/trino_to_gcs.py         | 210 +++++++++++++
 airflow/providers/google/provider.yaml             |   4 +
 airflow/providers/mysql/provider.yaml              |   5 +-
 .../providers/mysql/transfers/trino_to_mysql.py    |  83 ++++++
 airflow/providers/trino/CHANGELOG.rst              |  25 ++
 .../providers/trino/__init__.py                    |  27 +-
 .../providers/trino/hooks/__init__.py              |  27 +-
 airflow/providers/trino/hooks/trino.py             | 191 ++++++++++++
 .../providers/trino/provider.yaml                  |  43 ++-
 airflow/sensors/sql.py                             |   1 +
 airflow/utils/db.py                                |  10 +
 breeze                                             |   2 +-
 breeze-complete                                    |   2 +-
 .../operators/transfer/trino_to_gcs.rst            | 142 +++++++++
 docs/apache-airflow-providers-trino/commits.rst    |  26 ++
 docs/apache-airflow-providers-trino/index.rst      |  43 +++
 docs/apache-airflow/extra-packages-ref.rst         |   4 +-
 docs/exts/docs_build/errors.py                     |   2 +-
 docs/exts/docs_build/spelling_checks.py            |   2 +-
 docs/integration-logos/trino/trino-og.png          | Bin 0 -> 34219 bytes
 docs/spelling_wordlist.txt                         |   1 +
 scripts/ci/docker-compose/integration-kerberos.yml |   2 +-
 scripts/ci/docker-compose/integration-redis.yml    |   2 +-
 ...ntegration-presto.yml => integration-trino.yml} |  20 +-
 .../krb5-kdc-server/utils/create_service.sh        |   2 +-
 .../ci/dockerfiles/{presto => trino}/Dockerfile    |  14 +-
 .../{presto => trino}/build_and_push.sh            |  14 +-
 .../ci/dockerfiles/{presto => trino}/entrypoint.sh |  43 ++-
 scripts/ci/libraries/_initialization.sh            |   2 +-
 scripts/in_container/check_environment.sh          |  14 +-
 .../run_install_and_test_provider_packages.sh      |   8 +-
 setup.py                                           |   7 +-
 tests/cli/commands/test_connection_command.py      |   4 +
 tests/conftest.py                                  |   2 +-
 tests/core/test_providers_manager.py               |   2 +
 .../google/cloud/transfers/test_trino_to_gcs.py    | 331 +++++++++++++++++++++
 .../cloud/transfers/test_trino_to_gcs_system.py    | 169 +++++++++++
 .../mysql/transfers/test_trino_to_mysql.py         |  73 +++++
 tests/providers/presto/hooks/test_presto.py        |  25 --
 .../providers/trino/__init__.py                    |  27 +-
 .../providers/trino/hooks/__init__.py              |  27 +-
 .../test_presto.py => trino/hooks/test_trino.py}   |  54 ++--
 49 files changed, 1641 insertions(+), 266 deletions(-)

diff --git a/BREEZE.rst b/BREEZE.rst
index 72633e8..66532bd 100644
--- a/BREEZE.rst
+++ b/BREEZE.rst
@@ -2444,7 +2444,7 @@ This is the current syntax for  `./breeze <./breeze>`_:
           start all integrations. Selected integrations are not saved for future execution.
           One of:
 
-                 cassandra kerberos mongo openldap pinot presto rabbitmq redis statsd all
+                 cassandra kerberos mongo openldap pinot rabbitmq redis statsd trino all
 
   --init-script INIT_SCRIPT_FILE
           Initialization script name - Sourced from files/airflow-breeze-config. Default value
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
index 19c4077..e82fd4e 100644
--- a/CONTRIBUTING.rst
+++ b/CONTRIBUTING.rst
@@ -63,6 +63,14 @@ Fix Bugs
 Look through the GitHub issues for bugs. Anything is open to whoever wants to
 implement it.
 
+Issue reporting and resolution process
+--------------------------------------
+
+The Apache Airflow project uses a set of labels for tracking and triaging issues, as
+well as a set of priorities and milestones to track how and when the enhancements and bug
+fixes make it into an Airflow release. This is documented as part of
+the `Issue reporting and resolution process <ISSUE_TRIAGE_PROCESS.rst>`_,
+
 Implement Features
 ------------------
 
@@ -117,7 +125,7 @@ Committers/Maintainers
 Committers are community members that have write access to the project’s repositories, i.e., they can modify the code,
 documentation, and website by themselves and also accept other contributions.
 
-The official list of committers can be found `here <https://airflow.apache.org/docs/stable/project.html#committers>`__.
+The official list of committers can be found `here <https://airflow.apache.org/docs/apache-airflow/stable/project.html#committers>`__.
 
 Additionally, committers are listed in a few other places (some of these may only be visible to existing committers):
 
@@ -188,9 +196,14 @@ From the `apache/airflow <https://github.com/apache/airflow>`_ repo,
 
 Step 2: Configure Your Environment
 ----------------------------------
-Configure the Docker-based Breeze development environment and run tests.
 
-You can use the default Breeze configuration as follows:
+You can use either a local virtual env or a Docker-based env. The differences
+between the two are explained `here <https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#development-environments>`_.
+
+The local env's instructions can be found in full in the  `LOCAL_VIRTUALENV.rst <https://github.com/apache/airflow/blob/master/LOCAL_VIRTUALENV.rst>`_ file.
+The Docker env is here to maintain a consistent and common development environment so that you can replicate CI failures locally and work on solving them locally rather by pushing to CI.
+
+You can configure the Docker-based Breeze development environment as follows:
 
 1. Install the latest versions of the Docker Community Edition
    and Docker Compose and add them to the PATH.
@@ -245,7 +258,7 @@ Step 4: Prepare PR
 
    For example, to address this example issue, do the following:
 
-   * Read about `email configuration in Airflow </docs/howto/email-config.rst>`__.
+   * Read about `email configuration in Airflow </docs/apache-airflow/howto/email-config.rst>`__.
 
    * Find the class you should modify. For the example GitHub issue,
      this is `email.py <https://github.com/apache/airflow/blob/master/airflow/utils/email.py>`__.
@@ -297,7 +310,7 @@ Step 4: Prepare PR
    and send it through the right path:
 
    * In case of a "no-code" change, approval will generate a comment that the PR can be merged and no
-     tests are needed. This is usually when the change modifies some non-documentation related rst
+     tests are needed. This is usually when the change modifies some non-documentation related RST
      files (such as this file). No python tests are run and no CI images are built for such PR. Usually
      it can be approved and merged few minutes after it is submitted (unless there is a big queue of jobs).
 
@@ -368,7 +381,7 @@ these guidelines:
     of the same PR. Doc string is often sufficient. Make sure to follow the
     Sphinx compatible standards.
 
--   Make sure your code fulfils all the
+-   Make sure your code fulfills all the
     `static code checks <STATIC_CODE_CHECKS.rst#pre-commit-hooks>`__ we have in our code. The easiest way
     to make sure of that is to use `pre-commit hooks <STATIC_CODE_CHECKS.rst#pre-commit-hooks>`__
 
@@ -414,7 +427,7 @@ The production images are build in DockerHub from:
 * ``2.0.*``, ``2.0.*rc*`` releases from the ``v2-0-stable`` branch when we prepare release candidates and
   final releases. There are no production images prepared from v2-0-stable branch.
 
-Similar rules apply to ``1.10.x`` releases until June 2020. We have ``v1-10-test`` and ``v1-10-stable``
+Similar rules apply to ``1.10.x`` releases until June 2021. We have ``v1-10-test`` and ``v1-10-stable``
 branches there.
 
 Development Environments
@@ -581,8 +594,8 @@ google_auth, grpc, hashicorp, hdfs, hive, http, imap, jdbc, jenkins, jira, kerbe
 ldap, microsoft.azure, microsoft.mssql, microsoft.winrm, mongo, mssql, mysql, neo4j, odbc, openfaas,
 opsgenie, oracle, pagerduty, papermill, password, pinot, plexus, postgres, presto, qds, qubole,
 rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry, sftp, singularity, slack,
-snowflake, spark, sqlite, ssh, statsd, tableau, telegram, vertica, virtualenv, webhdfs, winrm,
-yandex, zendesk
+snowflake, spark, sqlite, ssh, statsd, tableau, telegram, trino, vertica, virtualenv, webhdfs,
+winrm, yandex, zendesk
 
   .. END EXTRAS HERE
 
@@ -647,11 +660,11 @@ apache.hive                amazon,microsoft.mssql,mysql,presto,samba,vertica
 apache.livy                http
 dingding                   http
 discord                    http
-google                     amazon,apache.beam,apache.cassandra,cncf.kubernetes,facebook,microsoft.azure,microsoft.mssql,mysql,postgres,presto,salesforce,sftp,ssh
+google                     amazon,apache.beam,apache.cassandra,cncf.kubernetes,facebook,microsoft.azure,microsoft.mssql,mysql,postgres,presto,salesforce,sftp,ssh,trino
 hashicorp                  google
 microsoft.azure            google,oracle
 microsoft.mssql            odbc
-mysql                      amazon,presto,vertica
+mysql                      amazon,presto,trino,vertica
 opsgenie                   http
 postgres                   amazon
 salesforce                 tableau
@@ -742,7 +755,7 @@ providers.
   not only "green path"
 
 * Integration tests where 'local' integration with a component is possible (for example tests with
-  MySQL/Postgres DB/Presto/Kerberos all have integration tests which run with real, dockerised components
+  MySQL/Postgres DB/Trino/Kerberos all have integration tests which run with real, dockerized components
 
 * System Tests which provide end-to-end testing, usually testing together several operators, sensors,
   transfers connecting to a real external system
@@ -755,8 +768,8 @@ Dependency management
 
 Airflow is not a standard python project. Most of the python projects fall into one of two types -
 application or library. As described in
-[StackOverflow Question](https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions)
-decision whether to pin (freeze) dependency versions for a python project depends on the type. For
+`this StackOverflow question <https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions>`_,
+the decision whether to pin (freeze) dependency versions for a python project depends on the type. For
 applications, dependencies should be pinned, but for libraries, they should be open.
 
 For application, pinning the dependencies makes it more stable to install in the future - because new
@@ -964,8 +977,8 @@ If this function is designed to be called by "end-users" (i.e. DAG authors) then
       ...
       # You SHOULD not commit the session here. The wrapper will take care of commit()/rollback() if exception
 
-Don't use time() for duration calcuations
------------------------------------------
+Don't use time() for duration calculations
+------------------------------------------
 
 If you wish to compute the time difference between two events with in the same process, use
 ``time.monotonic()``, not ``time.time()`` nor ``timzeone.utcnow()``.
@@ -1011,7 +1024,7 @@ Naming Conventions for provider packages
 In Airflow 2.0 we standardized and enforced naming for provider packages, modules and classes.
 those rules (introduced as AIP-21) were not only introduced but enforced using automated checks
 that verify if the naming conventions are followed. Here is a brief summary of the rules, for
-detailed discussion you can go to [AIP-21 Changes in import paths](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths)
+detailed discussion you can go to `AIP-21 Changes in import paths <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths>`_
 
 The rules are as follows:
 
diff --git a/IMAGES.rst b/IMAGES.rst
index 40299b6..23012a2 100644
--- a/IMAGES.rst
+++ b/IMAGES.rst
@@ -116,7 +116,7 @@ parameter to Breeze:
 
 .. code-block:: bash
 
-  ./breeze build-image --python 3.7 --additional-extras=presto \
+  ./breeze build-image --python 3.7 --additional-extras=trino \
       --production-image --install-airflow-version=2.0.0
 
 
@@ -163,7 +163,7 @@ You can also skip installing airflow and install it from locally provided files
 
 .. code-block:: bash
 
-  ./breeze build-image --python 3.7 --additional-extras=presto \
+  ./breeze build-image --python 3.7 --additional-extras=trino \
       --production-image --disable-pypi-when-building --install-from-local-files-when-building
 
 In this case you airflow and all packages (.whl files) should be placed in ``docker-context-files`` folder.
diff --git a/INSTALL b/INSTALL
index d175aa1..34fccd2 100644
--- a/INSTALL
+++ b/INSTALL
@@ -106,8 +106,8 @@ google_auth, grpc, hashicorp, hdfs, hive, http, imap, jdbc, jenkins, jira, kerbe
 ldap, microsoft.azure, microsoft.mssql, microsoft.winrm, mongo, mssql, mysql, neo4j, odbc, openfaas,
 opsgenie, oracle, pagerduty, papermill, password, pinot, plexus, postgres, presto, qds, qubole,
 rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry, sftp, singularity, slack,
-snowflake, spark, sqlite, ssh, statsd, tableau, telegram, vertica, virtualenv, webhdfs, winrm,
-yandex, zendesk
+snowflake, spark, sqlite, ssh, statsd, tableau, telegram, trino, vertica, virtualenv, webhdfs,
+winrm, yandex, zendesk
 
 # END EXTRAS HERE
 
diff --git a/TESTING.rst b/TESTING.rst
index c56f6fd..102cac8 100644
--- a/TESTING.rst
+++ b/TESTING.rst
@@ -281,12 +281,12 @@ The following integrations are available:
      - Integration required for OpenLDAP hooks
    * - pinot
      - Integration required for Apache Pinot hooks
-   * - presto
-     - Integration required for Presto hooks
    * - rabbitmq
      - Integration required for Celery executor tests
    * - redis
      - Integration required for Celery executor tests
+   * - trino
+     - Integration required for Trino hooks
 
 To start the ``mongo`` integration only, enter:
 
diff --git a/airflow/providers/dependencies.json b/airflow/providers/dependencies.json
index b01e96c..81a3ba4 100644
--- a/airflow/providers/dependencies.json
+++ b/airflow/providers/dependencies.json
@@ -44,7 +44,8 @@
     "presto",
     "salesforce",
     "sftp",
-    "ssh"
+    "ssh",
+    "trino"
   ],
   "hashicorp": [
     "google"
@@ -59,6 +60,7 @@
   "mysql": [
     "amazon",
     "presto",
+    "trino",
     "vertica"
   ],
   "opsgenie": [
diff --git a/airflow/providers/google/cloud/example_dags/example_trino_to_gcs.py b/airflow/providers/google/cloud/example_dags/example_trino_to_gcs.py
new file mode 100644
index 0000000..32dc8a0
--- /dev/null
+++ b/airflow/providers/google/cloud/example_dags/example_trino_to_gcs.py
@@ -0,0 +1,150 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Example DAG using TrinoToGCSOperator.
+"""
+import os
+import re
+
+from airflow import models
+from airflow.providers.google.cloud.operators.bigquery import (
+    BigQueryCreateEmptyDatasetOperator,
+    BigQueryCreateExternalTableOperator,
+    BigQueryDeleteDatasetOperator,
+    BigQueryExecuteQueryOperator,
+)
+from airflow.providers.google.cloud.transfers.trino_to_gcs import TrinoToGCSOperator
+from airflow.utils.dates import days_ago
+
+GCP_PROJECT_ID = os.environ.get("GCP_PROJECT_ID", 'example-project')
+GCS_BUCKET = os.environ.get("GCP_TRINO_TO_GCS_BUCKET_NAME", "test-trino-to-gcs-bucket")
+DATASET_NAME = os.environ.get("GCP_TRINO_TO_GCS_DATASET_NAME", "test_trino_to_gcs_dataset")
+
+SOURCE_MULTIPLE_TYPES = "memory.default.test_multiple_types"
+SOURCE_CUSTOMER_TABLE = "tpch.sf1.customer"
+
+
+def safe_name(s: str) -> str:
+    """
+    Remove invalid characters for filename
+    """
+    return re.sub("[^0-9a-zA-Z_]+", "_", s)
+
+
+with models.DAG(
+    dag_id="example_trino_to_gcs",
+    schedule_interval=None,  # Override to match your needs
+    start_date=days_ago(1),
+    tags=["example"],
+) as dag:
+
+    create_dataset = BigQueryCreateEmptyDatasetOperator(task_id="create-dataset", dataset_id=DATASET_NAME)
+
+    delete_dataset = BigQueryDeleteDatasetOperator(
+        task_id="delete_dataset", dataset_id=DATASET_NAME, delete_contents=True
+    )
+
+    # [START howto_operator_trino_to_gcs_basic]
+    trino_to_gcs_basic = TrinoToGCSOperator(
+        task_id="trino_to_gcs_basic",
+        sql=f"select * from {SOURCE_MULTIPLE_TYPES}",
+        bucket=GCS_BUCKET,
+        filename=f"{safe_name(SOURCE_MULTIPLE_TYPES)}.{{}}.json",
+    )
+    # [END howto_operator_trino_to_gcs_basic]
+
+    # [START howto_operator_trino_to_gcs_multiple_types]
+    trino_to_gcs_multiple_types = TrinoToGCSOperator(
+        task_id="trino_to_gcs_multiple_types",
+        sql=f"select * from {SOURCE_MULTIPLE_TYPES}",
+        bucket=GCS_BUCKET,
+        filename=f"{safe_name(SOURCE_MULTIPLE_TYPES)}.{{}}.json",
+        schema_filename=f"{safe_name(SOURCE_MULTIPLE_TYPES)}-schema.json",
+        gzip=False,
+    )
+    # [END howto_operator_trino_to_gcs_multiple_types]
+
+    # [START howto_operator_create_external_table_multiple_types]
+    create_external_table_multiple_types = BigQueryCreateExternalTableOperator(
+        task_id="create_external_table_multiple_types",
+        bucket=GCS_BUCKET,
+        source_objects=[f"{safe_name(SOURCE_MULTIPLE_TYPES)}.*.json"],
+        source_format="NEWLINE_DELIMITED_JSON",
+        destination_project_dataset_table=f"{DATASET_NAME}.{safe_name(SOURCE_MULTIPLE_TYPES)}",
+        schema_object=f"{safe_name(SOURCE_MULTIPLE_TYPES)}-schema.json",
+    )
+    # [END howto_operator_create_external_table_multiple_types]
+
+    read_data_from_gcs_multiple_types = BigQueryExecuteQueryOperator(
+        task_id="read_data_from_gcs_multiple_types",
+        sql=f"SELECT COUNT(*) FROM `{GCP_PROJECT_ID}.{DATASET_NAME}.{safe_name(SOURCE_MULTIPLE_TYPES)}`",
+        use_legacy_sql=False,
+    )
+
+    # [START howto_operator_trino_to_gcs_many_chunks]
+    trino_to_gcs_many_chunks = TrinoToGCSOperator(
+        task_id="trino_to_gcs_many_chunks",
+        sql=f"select * from {SOURCE_CUSTOMER_TABLE}",
+        bucket=GCS_BUCKET,
+        filename=f"{safe_name(SOURCE_CUSTOMER_TABLE)}.{{}}.json",
+        schema_filename=f"{safe_name(SOURCE_CUSTOMER_TABLE)}-schema.json",
+        approx_max_file_size_bytes=10_000_000,
+        gzip=False,
+    )
+    # [END howto_operator_trino_to_gcs_many_chunks]
+
+    create_external_table_many_chunks = BigQueryCreateExternalTableOperator(
+        task_id="create_external_table_many_chunks",
+        bucket=GCS_BUCKET,
+        source_objects=[f"{safe_name(SOURCE_CUSTOMER_TABLE)}.*.json"],
+        source_format="NEWLINE_DELIMITED_JSON",
+        destination_project_dataset_table=f"{DATASET_NAME}.{safe_name(SOURCE_CUSTOMER_TABLE)}",
+        schema_object=f"{safe_name(SOURCE_CUSTOMER_TABLE)}-schema.json",
+    )
+
+    # [START howto_operator_read_data_from_gcs_many_chunks]
+    read_data_from_gcs_many_chunks = BigQueryExecuteQueryOperator(
+        task_id="read_data_from_gcs_many_chunks",
+        sql=f"SELECT COUNT(*) FROM `{GCP_PROJECT_ID}.{DATASET_NAME}.{safe_name(SOURCE_CUSTOMER_TABLE)}`",
+        use_legacy_sql=False,
+    )
+    # [END howto_operator_read_data_from_gcs_many_chunks]
+
+    # [START howto_operator_trino_to_gcs_csv]
+    trino_to_gcs_csv = TrinoToGCSOperator(
+        task_id="trino_to_gcs_csv",
+        sql=f"select * from {SOURCE_MULTIPLE_TYPES}",
+        bucket=GCS_BUCKET,
+        filename=f"{safe_name(SOURCE_MULTIPLE_TYPES)}.{{}}.csv",
+        schema_filename=f"{safe_name(SOURCE_MULTIPLE_TYPES)}-schema.json",
+        export_format="csv",
+    )
+    # [END howto_operator_trino_to_gcs_csv]
+
+    create_dataset >> trino_to_gcs_basic
+    create_dataset >> trino_to_gcs_multiple_types
+    create_dataset >> trino_to_gcs_many_chunks
+    create_dataset >> trino_to_gcs_csv
+
+    trino_to_gcs_multiple_types >> create_external_table_multiple_types >> read_data_from_gcs_multiple_types
+    trino_to_gcs_many_chunks >> create_external_table_many_chunks >> read_data_from_gcs_many_chunks
+
+    trino_to_gcs_basic >> delete_dataset
+    trino_to_gcs_csv >> delete_dataset
+    read_data_from_gcs_multiple_types >> delete_dataset
+    read_data_from_gcs_many_chunks >> delete_dataset
diff --git a/airflow/providers/google/cloud/transfers/trino_to_gcs.py b/airflow/providers/google/cloud/transfers/trino_to_gcs.py
new file mode 100644
index 0000000..e2f2306
--- /dev/null
+++ b/airflow/providers/google/cloud/transfers/trino_to_gcs.py
@@ -0,0 +1,210 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from typing import Any, Dict, List, Tuple
+
+from trino.client import TrinoResult
+from trino.dbapi import Cursor as TrinoCursor
+
+from airflow.providers.google.cloud.transfers.sql_to_gcs import BaseSQLToGCSOperator
+from airflow.providers.trino.hooks.trino import TrinoHook
+from airflow.utils.decorators import apply_defaults
+
+
+class _TrinoToGCSTrinoCursorAdapter:
+    """
+    An adapter that adds additional feature to the Trino cursor.
+
+    The implementation of cursor in the trino library is not sufficient.
+    The following changes have been made:
+
+    * The poke mechanism for row. You can look at the next row without consuming it.
+    * The description attribute is available before reading the first row. Thanks to the poke mechanism.
+    * the iterator interface has been implemented.
+
+    A detailed description of the class methods is available in
+    `PEP-249 <https://www.python.org/dev/peps/pep-0249/>`__.
+    """
+
+    def __init__(self, cursor: TrinoCursor):
+        self.cursor: TrinoCursor = cursor
+        self.rows: List[Any] = []
+        self.initialized: bool = False
+
+    @property
+    def description(self) -> List[Tuple]:
+        """
+        This read-only attribute is a sequence of 7-item sequences.
+
+        Each of these sequences contains information describing one result column:
+
+        * ``name``
+        * ``type_code``
+        * ``display_size``
+        * ``internal_size``
+        * ``precision``
+        * ``scale``
+        * ``null_ok``
+
+        The first two items (``name`` and ``type_code``) are mandatory, the other
+        five are optional and are set to None if no meaningful values can be provided.
+        """
+        if not self.initialized:
+            # Peek for first row to load description.
+            self.peekone()
+        return self.cursor.description
+
+    @property
+    def rowcount(self) -> int:
+        """The read-only attribute specifies the number of rows"""
+        return self.cursor.rowcount
+
+    def close(self) -> None:
+        """Close the cursor now"""
+        self.cursor.close()
+
+    def execute(self, *args, **kwargs) -> TrinoResult:
+        """Prepare and execute a database operation (query or command)."""
+        self.initialized = False
+        self.rows = []
+        return self.cursor.execute(*args, **kwargs)
+
+    def executemany(self, *args, **kwargs):
+        """
+        Prepare a database operation (query or command) and then execute it against all parameter
+        sequences or mappings found in the sequence seq_of_parameters.
+        """
+        self.initialized = False
+        self.rows = []
+        return self.cursor.executemany(*args, **kwargs)
+
+    def peekone(self) -> Any:
+        """Return the next row without consuming it."""
+        self.initialized = True
+        element = self.cursor.fetchone()
+        self.rows.insert(0, element)
+        return element
+
+    def fetchone(self) -> Any:
+        """
+        Fetch the next row of a query result set, returning a single sequence, or
+        ``None`` when no more data is available.
+        """
+        if self.rows:
+            return self.rows.pop(0)
+        return self.cursor.fetchone()
+
+    def fetchmany(self, size=None) -> list:
+        """
+        Fetch the next set of rows of a query result, returning a sequence of sequences
+        (e.g. a list of tuples). An empty sequence is returned when no more rows are available.
+        """
+        if size is None:
+            size = self.cursor.arraysize
+
+        result = []
+        for _ in range(size):
+            row = self.fetchone()
+            if row is None:
+                break
+            result.append(row)
+
+        return result
+
+    def __next__(self) -> Any:
+        """
+        Return the next row from the currently executing SQL statement using the same semantics as
+        ``.fetchone()``.  A ``StopIteration`` exception is raised when the result set is exhausted.
+        :return:
+        """
+        result = self.fetchone()
+        if result is None:
+            raise StopIteration()
+        return result
+
+    def __iter__(self) -> "_TrinoToGCSTrinoCursorAdapter":
+        """Return self to make cursors compatible to the iteration protocol"""
+        return self
+
+
+class TrinoToGCSOperator(BaseSQLToGCSOperator):
+    """Copy data from TrinoDB to Google Cloud Storage in JSON or CSV format.
+
+    :param trino_conn_id: Reference to a specific Trino hook.
+    :type trino_conn_id: str
+    """
+
+    ui_color = "#a0e08c"
+
+    type_map = {
+        "BOOLEAN": "BOOL",
+        "TINYINT": "INT64",
+        "SMALLINT": "INT64",
+        "INTEGER": "INT64",
+        "BIGINT": "INT64",
+        "REAL": "FLOAT64",
+        "DOUBLE": "FLOAT64",
+        "DECIMAL": "NUMERIC",
+        "VARCHAR": "STRING",
+        "CHAR": "STRING",
+        "VARBINARY": "BYTES",
+        "JSON": "STRING",
+        "DATE": "DATE",
+        "TIME": "TIME",
+        # BigQuery don't time with timezone native.
+        "TIME WITH TIME ZONE": "STRING",
+        "TIMESTAMP": "TIMESTAMP",
+        # BigQuery supports a narrow range of time zones during import.
+        # You should use TIMESTAMP function, if you want have TIMESTAMP type
+        "TIMESTAMP WITH TIME ZONE": "STRING",
+        "IPADDRESS": "STRING",
+        "UUID": "STRING",
+    }
+
+    @apply_defaults
+    def __init__(self, *, trino_conn_id: str = "trino_default", **kwargs):
+        super().__init__(**kwargs)
+        self.trino_conn_id = trino_conn_id
+
+    def query(self):
+        """Queries trino and returns a cursor to the results."""
+        trino = TrinoHook(trino_conn_id=self.trino_conn_id)
+        conn = trino.get_conn()
+        cursor = conn.cursor()
+        self.log.info("Executing: %s", self.sql)
+        cursor.execute(self.sql)
+        return _TrinoToGCSTrinoCursorAdapter(cursor)
+
+    def field_to_bigquery(self, field) -> Dict[str, str]:
+        """Convert trino field type to BigQuery field type."""
+        clear_field_type = field[1].upper()
+        # remove type argument e.g. DECIMAL(2, 10) => DECIMAL
+        clear_field_type, _, _ = clear_field_type.partition("(")
+        new_field_type = self.type_map.get(clear_field_type, "STRING")
+
+        return {"name": field[0], "type": new_field_type}
+
+    def convert_type(self, value, schema_type):
+        """
+        Do nothing. Trino uses JSON on the transport layer, so types are simple.
+
+        :param value: Trino column value
+        :type value: Any
+        :param schema_type: BigQuery data type
+        :type schema_type: str
+        """
+        return value
diff --git a/airflow/providers/google/provider.yaml b/airflow/providers/google/provider.yaml
index 690eb00..d1f5b5f 100644
--- a/airflow/providers/google/provider.yaml
+++ b/airflow/providers/google/provider.yaml
@@ -588,6 +588,10 @@ transfers:
     target-integration-name: Google Cloud Storage (GCS)
     how-to-guide: /docs/apache-airflow-providers-google/operators/transfer/presto_to_gcs.rst
     python-module: airflow.providers.google.cloud.transfers.presto_to_gcs
+  - source-integration-name: Trino
+    target-integration-name: Google Cloud Storage (GCS)
+    how-to-guide: /docs/apache-airflow-providers-google/operators/transfer/trino_to_gcs.rst
+    python-module: airflow.providers.google.cloud.transfers.trino_to_gcs
   - source-integration-name: SQL
     target-integration-name: Google Cloud Storage (GCS)
     python-module: airflow.providers.google.cloud.transfers.sql_to_gcs
diff --git a/airflow/providers/mysql/provider.yaml b/airflow/providers/mysql/provider.yaml
index df3b343..9eea5d1 100644
--- a/airflow/providers/mysql/provider.yaml
+++ b/airflow/providers/mysql/provider.yaml
@@ -48,9 +48,12 @@ transfers:
   - source-integration-name: Amazon Simple Storage Service (S3)
     target-integration-name: MySQL
     python-module: airflow.providers.mysql.transfers.s3_to_mysql
-  - source-integration-name: Snowflake
+  - source-integration-name: Presto
     target-integration-name: MySQL
     python-module: airflow.providers.mysql.transfers.presto_to_mysql
+  - source-integration-name: Trino
+    target-integration-name: MySQL
+    python-module: airflow.providers.mysql.transfers.trino_to_mysql
 
 hook-class-names:
   - airflow.providers.mysql.hooks.mysql.MySqlHook
diff --git a/airflow/providers/mysql/transfers/trino_to_mysql.py b/airflow/providers/mysql/transfers/trino_to_mysql.py
new file mode 100644
index 0000000..b97550e
--- /dev/null
+++ b/airflow/providers/mysql/transfers/trino_to_mysql.py
@@ -0,0 +1,83 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from typing import Dict, Optional
+
+from airflow.models import BaseOperator
+from airflow.providers.mysql.hooks.mysql import MySqlHook
+from airflow.providers.trino.hooks.trino import TrinoHook
+from airflow.utils.decorators import apply_defaults
+
+
+class TrinoToMySqlOperator(BaseOperator):
+    """
+    Moves data from Trino to MySQL, note that for now the data is loaded
+    into memory before being pushed to MySQL, so this operator should
+    be used for smallish amount of data.
+
+    :param sql: SQL query to execute against Trino. (templated)
+    :type sql: str
+    :param mysql_table: target MySQL table, use dot notation to target a
+        specific database. (templated)
+    :type mysql_table: str
+    :param mysql_conn_id: source mysql connection
+    :type mysql_conn_id: str
+    :param trino_conn_id: source trino connection
+    :type trino_conn_id: str
+    :param mysql_preoperator: sql statement to run against mysql prior to
+        import, typically use to truncate of delete in place
+        of the data coming in, allowing the task to be idempotent (running
+        the task twice won't double load data). (templated)
+    :type mysql_preoperator: str
+    """
+
+    template_fields = ('sql', 'mysql_table', 'mysql_preoperator')
+    template_ext = ('.sql',)
+    template_fields_renderers = {"mysql_preoperator": "sql"}
+    ui_color = '#a0e08c'
+
+    @apply_defaults
+    def __init__(
+        self,
+        *,
+        sql: str,
+        mysql_table: str,
+        trino_conn_id: str = 'trino_default',
+        mysql_conn_id: str = 'mysql_default',
+        mysql_preoperator: Optional[str] = None,
+        **kwargs,
+    ) -> None:
+        super().__init__(**kwargs)
+        self.sql = sql
+        self.mysql_table = mysql_table
+        self.mysql_conn_id = mysql_conn_id
+        self.mysql_preoperator = mysql_preoperator
+        self.trino_conn_id = trino_conn_id
+
+    def execute(self, context: Dict) -> None:
+        trino = TrinoHook(trino_conn_id=self.trino_conn_id)
+        self.log.info("Extracting data from Trino: %s", self.sql)
+        results = trino.get_records(self.sql)
+
+        mysql = MySqlHook(mysql_conn_id=self.mysql_conn_id)
+        if self.mysql_preoperator:
+            self.log.info("Running MySQL preoperator")
+            self.log.info(self.mysql_preoperator)
+            mysql.run(self.mysql_preoperator)
+
+        self.log.info("Inserting rows into MySQL")
+        mysql.insert_rows(table=self.mysql_table, rows=results)
diff --git a/airflow/providers/trino/CHANGELOG.rst b/airflow/providers/trino/CHANGELOG.rst
new file mode 100644
index 0000000..cef7dda
--- /dev/null
+++ b/airflow/providers/trino/CHANGELOG.rst
@@ -0,0 +1,25 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+Changelog
+---------
+
+1.0.0
+.....
+
+Initial version of the provider.
diff --git a/scripts/ci/docker-compose/integration-redis.yml b/airflow/providers/trino/__init__.py
similarity index 59%
copy from scripts/ci/docker-compose/integration-redis.yml
copy to airflow/providers/trino/__init__.py
index ab353d2..217e5db 100644
--- a/scripts/ci/docker-compose/integration-redis.yml
+++ b/airflow/providers/trino/__init__.py
@@ -1,3 +1,4 @@
+#
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
@@ -14,29 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
----
-version: "2.2"
-services:
-  redis:
-    image: redis:5.0.1
-    volumes:
-      - /dev/urandom:/dev/random   # Required to get non-blocking entropy source
-      - redis-db-volume:/data/presto
-    ports:
-      - "${REDIS_HOST_PORT}:6379"
-    healthcheck:
-      test: ["CMD", "redis-cli", "ping"]
-      interval: 5s
-      timeout: 30s
-      retries: 50
-    restart: always
-
-  airflow:
-    environment:
-      - INTEGRATION_REDIS=true
-    depends_on:
-      redis:
-        condition: service_healthy
-
-volumes:
-  redis-db-volume:
diff --git a/scripts/ci/docker-compose/integration-redis.yml b/airflow/providers/trino/hooks/__init__.py
similarity index 59%
copy from scripts/ci/docker-compose/integration-redis.yml
copy to airflow/providers/trino/hooks/__init__.py
index ab353d2..217e5db 100644
--- a/scripts/ci/docker-compose/integration-redis.yml
+++ b/airflow/providers/trino/hooks/__init__.py
@@ -1,3 +1,4 @@
+#
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
@@ -14,29 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
----
-version: "2.2"
-services:
-  redis:
-    image: redis:5.0.1
-    volumes:
-      - /dev/urandom:/dev/random   # Required to get non-blocking entropy source
-      - redis-db-volume:/data/presto
-    ports:
-      - "${REDIS_HOST_PORT}:6379"
-    healthcheck:
-      test: ["CMD", "redis-cli", "ping"]
-      interval: 5s
-      timeout: 30s
-      retries: 50
-    restart: always
-
-  airflow:
-    environment:
-      - INTEGRATION_REDIS=true
-    depends_on:
-      redis:
-        condition: service_healthy
-
-volumes:
-  redis-db-volume:
diff --git a/airflow/providers/trino/hooks/trino.py b/airflow/providers/trino/hooks/trino.py
new file mode 100644
index 0000000..0914d04
--- /dev/null
+++ b/airflow/providers/trino/hooks/trino.py
@@ -0,0 +1,191 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import os
+from typing import Any, Iterable, Optional
+
+import trino
+from trino.exceptions import DatabaseError
+from trino.transaction import IsolationLevel
+
+from airflow import AirflowException
+from airflow.configuration import conf
+from airflow.hooks.dbapi import DbApiHook
+from airflow.models import Connection
+
+
+class TrinoException(Exception):
+    """Trino exception"""
+
+
+def _boolify(value):
+    if isinstance(value, bool):
+        return value
+    if isinstance(value, str):
+        if value.lower() == 'false':
+            return False
+        elif value.lower() == 'true':
+            return True
+    return value
+
+
+class TrinoHook(DbApiHook):
+    """
+    Interact with Trino through trino package.
+
+    >>> ph = TrinoHook()
+    >>> sql = "SELECT count(1) AS num FROM airflow.static_babynames"
+    >>> ph.get_records(sql)
+    [[340698]]
+    """
+
+    conn_name_attr = 'trino_conn_id'
+    default_conn_name = 'trino_default'
+    conn_type = 'trino'
+    hook_name = 'Trino'
+
+    def get_conn(self) -> Connection:
+        """Returns a connection object"""
+        db = self.get_connection(
+            self.trino_conn_id  # type: ignore[attr-defined]  # pylint: disable=no-member
+        )
+        extra = db.extra_dejson
+        auth = None
+        if db.password and extra.get('auth') == 'kerberos':
+            raise AirflowException("Kerberos authorization doesn't support password.")
+        elif db.password:
+            auth = trino.auth.BasicAuthentication(db.login, db.password)
+        elif extra.get('auth') == 'kerberos':
+            auth = trino.auth.KerberosAuthentication(
+                config=extra.get('kerberos__config', os.environ.get('KRB5_CONFIG')),
+                service_name=extra.get('kerberos__service_name'),
+                mutual_authentication=_boolify(extra.get('kerberos__mutual_authentication', False)),
+                force_preemptive=_boolify(extra.get('kerberos__force_preemptive', False)),
+                hostname_override=extra.get('kerberos__hostname_override'),
+                sanitize_mutual_error_response=_boolify(
+                    extra.get('kerberos__sanitize_mutual_error_response', True)
+                ),
+                principal=extra.get('kerberos__principal', conf.get('kerberos', 'principal')),
+                delegate=_boolify(extra.get('kerberos__delegate', False)),
+                ca_bundle=extra.get('kerberos__ca_bundle'),
+            )
+
+        trino_conn = trino.dbapi.connect(
+            host=db.host,
+            port=db.port,
+            user=db.login,
+            source=db.extra_dejson.get('source', 'airflow'),
+            http_scheme=db.extra_dejson.get('protocol', 'http'),
+            catalog=db.extra_dejson.get('catalog', 'hive'),
+            schema=db.schema,
+            auth=auth,
+            isolation_level=self.get_isolation_level(),  # type: ignore[func-returns-value]
+        )
+        if extra.get('verify') is not None:
+            # Unfortunately verify parameter is available via public API.
+            # The PR is merged in the trino library, but has not been released.
+            # See: https://github.com/trinodb/trino-python-client/pull/31
+            trino_conn._http_session.verify = _boolify(extra['verify'])  # pylint: disable=protected-access
+
+        return trino_conn
+
+    def get_isolation_level(self) -> Any:
+        """Returns an isolation level"""
+        db = self.get_connection(
+            self.trino_conn_id  # type: ignore[attr-defined]  # pylint: disable=no-member
+        )
+        isolation_level = db.extra_dejson.get('isolation_level', 'AUTOCOMMIT').upper()
+        return getattr(IsolationLevel, isolation_level, IsolationLevel.AUTOCOMMIT)
+
+    @staticmethod
+    def _strip_sql(sql: str) -> str:
+        return sql.strip().rstrip(';')
+
+    def get_records(self, hql, parameters: Optional[dict] = None):
+        """Get a set of records from Trino"""
+        try:
+            return super().get_records(self._strip_sql(hql), parameters)
+        except DatabaseError as e:
+            raise TrinoException(e)
+
+    def get_first(self, hql: str, parameters: Optional[dict] = None) -> Any:
+        """Returns only the first row, regardless of how many rows the query returns."""
+        try:
+            return super().get_first(self._strip_sql(hql), parameters)
+        except DatabaseError as e:
+            raise TrinoException(e)
+
+    def get_pandas_df(self, hql, parameters=None, **kwargs):
+        """Get a pandas dataframe from a sql query."""
+        import pandas
+
+        cursor = self.get_cursor()
+        try:
+            cursor.execute(self._strip_sql(hql), parameters)
+            data = cursor.fetchall()
+        except DatabaseError as e:
+            raise TrinoException(e)
+        column_descriptions = cursor.description
+        if data:
+            df = pandas.DataFrame(data, **kwargs)
+            df.columns = [c[0] for c in column_descriptions]
+        else:
+            df = pandas.DataFrame(**kwargs)
+        return df
+
+    def run(
+        self,
+        hql,
+        autocommit: bool = False,
+        parameters: Optional[dict] = None,
+    ) -> None:
+        """Execute the statement against Trino. Can be used to create views."""
+        return super().run(sql=self._strip_sql(hql), parameters=parameters)
+
+    def insert_rows(
+        self,
+        table: str,
+        rows: Iterable[tuple],
+        target_fields: Optional[Iterable[str]] = None,
+        commit_every: int = 0,
+        replace: bool = False,
+        **kwargs,
+    ) -> None:
+        """
+        A generic way to insert a set of tuples into a table.
+
+        :param table: Name of the target table
+        :type table: str
+        :param rows: The rows to insert into the table
+        :type rows: iterable of tuples
+        :param target_fields: The names of the columns to fill in the table
+        :type target_fields: iterable of strings
+        :param commit_every: The maximum number of rows to insert in one
+            transaction. Set to 0 to insert all rows in one transaction.
+        :type commit_every: int
+        :param replace: Whether to replace instead of insert
+        :type replace: bool
+        """
+        if self.get_isolation_level() == IsolationLevel.AUTOCOMMIT:
+            self.log.info(
+                'Transactions are not enable in trino connection. '
+                'Please use the isolation_level property to enable it. '
+                'Falling back to insert all rows in one transaction.'
+            )
+            commit_every = 0
+
+        super().insert_rows(table, rows, target_fields, commit_every)
diff --git a/scripts/ci/docker-compose/integration-redis.yml b/airflow/providers/trino/provider.yaml
similarity index 60%
copy from scripts/ci/docker-compose/integration-redis.yml
copy to airflow/providers/trino/provider.yaml
index ab353d2..a59aaae 100644
--- a/scripts/ci/docker-compose/integration-redis.yml
+++ b/airflow/providers/trino/provider.yaml
@@ -14,29 +14,26 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
 ---
-version: "2.2"
-services:
-  redis:
-    image: redis:5.0.1
-    volumes:
-      - /dev/urandom:/dev/random   # Required to get non-blocking entropy source
-      - redis-db-volume:/data/presto
-    ports:
-      - "${REDIS_HOST_PORT}:6379"
-    healthcheck:
-      test: ["CMD", "redis-cli", "ping"]
-      interval: 5s
-      timeout: 30s
-      retries: 50
-    restart: always
+package-name: apache-airflow-providers-trino
+name: Trino
+description: |
+    `Trino <https://trino.io/>`__
+
+versions:
+  - 1.0.0
+
+integrations:
+  - integration-name: Trino
+    external-doc-url: https://trino.io/docs/
+    logo: /integration-logos/trino/trino-og.png
+    tags: [software]
 
-  airflow:
-    environment:
-      - INTEGRATION_REDIS=true
-    depends_on:
-      redis:
-        condition: service_healthy
+hooks:
+  - integration-name: Trino
+    python-modules:
+      - airflow.providers.trino.hooks.trino
 
-volumes:
-  redis-db-volume:
+hook-class-names:
+  - airflow.providers.trino.hooks.trino.TrinoHook
diff --git a/airflow/sensors/sql.py b/airflow/sensors/sql.py
index 573c7cd..2a76ea1 100644
--- a/airflow/sensors/sql.py
+++ b/airflow/sensors/sql.py
@@ -84,6 +84,7 @@ class SqlSensor(BaseSensorOperator):
             'presto',
             'snowflake',
             'sqlite',
+            'trino',
             'vertica',
         }
         if conn.conn_type not in allowed_conn_type:
diff --git a/airflow/utils/db.py b/airflow/utils/db.py
index 20b4b0b..4a9816c 100644
--- a/airflow/utils/db.py
+++ b/airflow/utils/db.py
@@ -510,6 +510,16 @@ def create_default_connections(session=None):
     )
     merge_conn(
         Connection(
+            conn_id="trino_default",
+            conn_type="trino",
+            host="localhost",
+            schema="hive",
+            port=3400,
+        ),
+        session,
+    )
+    merge_conn(
+        Connection(
             conn_id="vertica_default",
             conn_type="vertica",
             host="localhost",
diff --git a/breeze b/breeze
index c85a5ac..302cebb 100755
--- a/breeze
+++ b/breeze
@@ -819,7 +819,7 @@ function breeze::parse_arguments() {
             else
                 INTEGRATIONS+=("${INTEGRATION}")
             fi
-            if [[ " ${INTEGRATIONS[*]} " =~ " presto " ]]; then
+            if [[ " ${INTEGRATIONS[*]} " =~ " trino " ]]; then
               INTEGRATIONS+=("kerberos");
             fi
             echo
diff --git a/breeze-complete b/breeze-complete
index 83dfe9f..ff13c27 100644
--- a/breeze-complete
+++ b/breeze-complete
@@ -25,7 +25,7 @@
 
 _breeze_allowed_python_major_minor_versions="2.7 3.5 3.6 3.7 3.8"
 _breeze_allowed_backends="sqlite mysql postgres"
-_breeze_allowed_integrations="cassandra kerberos mongo openldap pinot presto rabbitmq redis statsd all"
+_breeze_allowed_integrations="cassandra kerberos mongo openldap pinot rabbitmq redis statsd trino all"
 _breeze_allowed_generate_constraints_modes="source-providers pypi-providers no-providers"
 # registrys is good here even if it is not correct english. We are adding s automatically to all variables
 _breeze_allowed_github_registrys="docker.pkg.github.com ghcr.io"
diff --git a/docs/apache-airflow-providers-google/operators/transfer/trino_to_gcs.rst b/docs/apache-airflow-providers-google/operators/transfer/trino_to_gcs.rst
new file mode 100644
index 0000000..29dc540
--- /dev/null
+++ b/docs/apache-airflow-providers-google/operators/transfer/trino_to_gcs.rst
@@ -0,0 +1,142 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+Trino to Google Cloud Storage Transfer Operator
+===============================================
+
+`Trino <https://trino.io/>`__ is an open source, fast, distributed SQL query engine for running interactive
+analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Trino allows
+querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores.
+A single Trino query can combine data from multiple sources, allowing for analytics across your entire
+organization.
+
+`Google Cloud Storage <https://cloud.google.com/storage/>`__ allows world-wide storage and retrieval of
+any amount of data at any time. You can use it to store backup and
+`archive data <https://cloud.google.com/storage/archival>`__ as well
+as a `data source for BigQuery <https://cloud.google.com/bigquery/external-data-cloud-storage>`__.
+
+
+Data transfer
+-------------
+
+Transfer files between Trino and Google Storage is performed with the
+:class:`~airflow.providers.google.cloud.transfers.trino_to_gcs.TrinoToGCSOperator` operator.
+
+This operator has 3 required parameters:
+
+* ``sql`` - The SQL to execute.
+* ``bucket`` - The bucket to upload to.
+* ``filename`` - The filename to use as the object name when uploading to Google Cloud Storage.
+  A ``{}`` should be specified in the filename to allow the operator to inject file
+  numbers in cases where the file is split due to size.
+
+All parameters are described in the reference documentation - :class:`~airflow.providers.google.cloud.transfers.trino_to_gcs.TrinoToGCSOperator`.
+
+An example operator call might look like this:
+
+.. exampleinclude:: /../../airflow/providers/google/cloud/example_dags/example_trino_to_gcs.py
+    :language: python
+    :dedent: 4
+    :start-after: [START howto_operator_trino_to_gcs_basic]
+    :end-before: [END howto_operator_trino_to_gcs_basic]
+
+Choice of data format
+^^^^^^^^^^^^^^^^^^^^^
+
+The operator supports two output formats:
+
+* ``json`` - JSON Lines (default)
+* ``csv``
+
+You can specify these options by the ``export_format`` parameter.
+
+If you want a CSV file to be created, your operator call might look like this:
+
+.. exampleinclude:: /../../airflow/providers/google/cloud/example_dags/example_trino_to_gcs.py
+    :language: python
+    :dedent: 4
+    :start-after: [START howto_operator_trino_to_gcs_csv]
+    :end-before: [END howto_operator_trino_to_gcs_csv]
+
+Generating BigQuery schema
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you set ``schema_filename`` parameter, a ``.json`` file containing the BigQuery schema fields for the table
+will be dumped from the database and upload to the bucket.
+
+If you want to create a schema file, then an example operator call might look like this:
+
+.. exampleinclude:: /../../airflow/providers/google/cloud/example_dags/example_trino_to_gcs.py
+    :language: python
+    :dedent: 4
+    :start-after: [START howto_operator_trino_to_gcs_multiple_types]
+    :end-before: [END howto_operator_trino_to_gcs_multiple_types]
+
+For more information about the BigQuery schema, please look at
+`Specifying schema <https://cloud.google.com/bigquery/docs/schemas>`__ in the Big Query documentation.
+
+Division of the result into multiple files
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This operator supports the ability to split large result into multiple files. The ``approx_max_file_size_bytes``
+parameters allows developers to specify the file size of the splits. By default, the file has no more
+than 1 900 000 000 bytes (1900 MB)
+
+Check `Quotas & limits in Google Cloud Storage <https://cloud.google.com/storage/quotas>`__ to see the
+maximum allowed file size for a single object.
+
+If you want to create 10 MB files, your code might look like this:
+
+.. exampleinclude:: /../../airflow/providers/google/cloud/example_dags/example_trino_to_gcs.py
+    :language: python
+    :dedent: 4
+    :start-after: [START howto_operator_read_data_from_gcs_many_chunks]
+    :end-before: [END howto_operator_read_data_from_gcs_many_chunks]
+
+Querying data using the BigQuery
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The data available in Google Cloud Storage can be used by BigQuery. You can load data to BigQuery or
+refer in queries directly to GCS data. For information about the loading data to the BigQuery, please look at
+`Introduction to loading data from Cloud Storage <https://cloud.google.com/bigquery/docs/loading-data-cloud-storage>`__
+in the BigQuery documentation. For information about the querying GCS data, please look at
+`Querying Cloud Storage data <https://cloud.google.com/bigquery/docs/loading-data-cloud-storage>`__ in
+the BigQuery documentation.
+
+Airflow also has numerous operators that allow you to create the use of BigQuery.
+For example, if you want to create an external table that allows you to create queries that
+read data directly from GCS, then you can use :class:`~airflow.providers.google.cloud.operators.bigquery.BigQueryCreateExternalTableOperator`.
+Using this operator looks like this:
+
+.. exampleinclude:: /../../airflow/providers/google/cloud/example_dags/example_trino_to_gcs.py
+    :language: python
+    :dedent: 4
+    :start-after: [START howto_operator_create_external_table_multiple_types]
+    :end-before: [END howto_operator_create_external_table_multiple_types]
+
+For more information about the Airflow and BigQuery integration, please look at
+the Python API Reference - :class:`~airflow.providers.google.cloud.operators.bigquery`.
+
+Reference
+^^^^^^^^^
+
+For further information, look at:
+
+* `Trino Documentation <https://trinodb.io//docs/current/>`__
+
+* `Google Cloud Storage Documentation <https://cloud.google.com/storage/docs/>`__
diff --git a/docs/apache-airflow-providers-trino/commits.rst b/docs/apache-airflow-providers-trino/commits.rst
new file mode 100644
index 0000000..5f0341d
--- /dev/null
+++ b/docs/apache-airflow-providers-trino/commits.rst
@@ -0,0 +1,26 @@
+
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Package apache-airflow-providers-trino
+------------------------------------------------------
+
+`Trino <https://trino.io/>`__
+
+
+This is detailed commit list of changes for versions provider package: ``trino``.
+For high-level changelog, see :doc:`package information including changelog <index>`.
diff --git a/docs/apache-airflow-providers-trino/index.rst b/docs/apache-airflow-providers-trino/index.rst
new file mode 100644
index 0000000..e74c7d6
--- /dev/null
+++ b/docs/apache-airflow-providers-trino/index.rst
@@ -0,0 +1,43 @@
+
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+``apache-airflow-providers-trino``
+===================================
+
+Content
+-------
+
+.. toctree::
+    :maxdepth: 1
+    :caption: References
+
+    Python API <_api/airflow/providers/trino/index>
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Resources
+
+    PyPI Repository <https://pypi.org/project/apache-airflow-providers-trino/>
+
+.. THE REMINDER OF THE FILE IS AUTOMATICALLY GENERATED. IT WILL BE OVERWRITTEN AT RELEASE TIME!
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Commits
+
+    Detailed list of commits <commits>
diff --git a/docs/apache-airflow/extra-packages-ref.rst b/docs/apache-airflow/extra-packages-ref.rst
index 5221beb..601c6bc 100644
--- a/docs/apache-airflow/extra-packages-ref.rst
+++ b/docs/apache-airflow/extra-packages-ref.rst
@@ -54,7 +54,7 @@ python dependencies for the provided package.
 +---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
 | google_auth         | ``pip install 'apache-airflow[google_auth]'``       | Google auth backend                                                        |
 +---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
-| kerberos            | ``pip install 'apache-airflow[kerberos]'``          | Kerberos integration for Kerberized services (Hadoop, Presto)              |
+| kerberos            | ``pip install 'apache-airflow[kerberos]'``          | Kerberos integration for Kerberized services (Hadoop, Presto, Trino)       |
 +---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
 | ldap                | ``pip install 'apache-airflow[ldap]'``              | LDAP authentication for users                                              |
 +---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
@@ -233,6 +233,8 @@ Those are extras that add dependencies needed for integration with other softwar
 +---------------------+-----------------------------------------------------+-------------------------------------------+
 | singularity         | ``pip install 'apache-airflow[singularity]'``       | Singularity container operator            |
 +---------------------+-----------------------------------------------------+-------------------------------------------+
+| trino               | ``pip install 'apache-airflow[trino]'``             | All Trino related operators & hooks       |
++---------------------+-----------------------------------------------------+-------------------------------------------+
 
 
 Other extras
diff --git a/docs/exts/docs_build/errors.py b/docs/exts/docs_build/errors.py
index 1a2ae06..3fe9f36 100644
--- a/docs/exts/docs_build/errors.py
+++ b/docs/exts/docs_build/errors.py
@@ -69,7 +69,7 @@ def display_errors_summary(build_errors: Dict[str, List[DocBuildError]]) -> None
             console.print("-" * 30, f"[red]Error {warning_no:3}[/]", "-" * 20)
             console.print(error.message)
             console.print()
-            if error.file_path and error.file_path != "<unknown>" and error.line_no:
+            if error.file_path and not error.file_path.endswith("<unknown>") and error.line_no:
                 console.print(
                     f"File path: {os.path.relpath(error.file_path, start=DOCS_DIR)} ({error.line_no})"
                 )
diff --git a/docs/exts/docs_build/spelling_checks.py b/docs/exts/docs_build/spelling_checks.py
index 4d3c26d..41a54c8 100644
--- a/docs/exts/docs_build/spelling_checks.py
+++ b/docs/exts/docs_build/spelling_checks.py
@@ -176,6 +176,6 @@ def _display_error(error: SpellingError):
             console.print(f"Suggested Spelling: '{error.suggestion}'")
         if error.context_line:
             console.print(f"Line with Error: '{error.context_line}'")
-        if error.line_no:
+        if error.file_path and not error.file_path.endswith("<unknown>") and error.line_no:
             console.print(f"Line Number: {error.line_no}")
             console.print(prepare_code_snippet(error.file_path, error.line_no))
diff --git a/docs/integration-logos/trino/trino-og.png b/docs/integration-logos/trino/trino-og.png
new file mode 100644
index 0000000..55bedf9
Binary files /dev/null and b/docs/integration-logos/trino/trino-og.png differ
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
index 5284966..ace29d3 100644
--- a/docs/spelling_wordlist.txt
+++ b/docs/spelling_wordlist.txt
@@ -1324,6 +1324,7 @@ tooltips
 traceback
 tracebacks
 travis
+trino
 trojan
 tsv
 ttl
diff --git a/scripts/ci/docker-compose/integration-kerberos.yml b/scripts/ci/docker-compose/integration-kerberos.yml
index d157bd6..95fc8c9 100644
--- a/scripts/ci/docker-compose/integration-kerberos.yml
+++ b/scripts/ci/docker-compose/integration-kerberos.yml
@@ -36,7 +36,7 @@ services:
         /opt/kerberos-utils/create_client.sh bob bob /root/kerberos-keytabs/airflow.keytab;
         /opt/kerberos-utils/create_service.sh krb5-machine-example-com airflow
         /root/kerberos-keytabs/airflow.keytab;
-        /opt/kerberos-utils/create_service.sh presto HTTP /root/kerberos-keytabs/presto.keytab;
+        /opt/kerberos-utils/create_service.sh trino HTTP /root/kerberos-keytabs/trino.keytab;
     healthcheck:
       test: |-
         python -c "
diff --git a/scripts/ci/docker-compose/integration-redis.yml b/scripts/ci/docker-compose/integration-redis.yml
index ab353d2..3cdf68c 100644
--- a/scripts/ci/docker-compose/integration-redis.yml
+++ b/scripts/ci/docker-compose/integration-redis.yml
@@ -21,7 +21,7 @@ services:
     image: redis:5.0.1
     volumes:
       - /dev/urandom:/dev/random   # Required to get non-blocking entropy source
-      - redis-db-volume:/data/presto
+      - redis-db-volume:/data/redis
     ports:
       - "${REDIS_HOST_PORT}:6379"
     healthcheck:
diff --git a/scripts/ci/docker-compose/integration-presto.yml b/scripts/ci/docker-compose/integration-trino.yml
similarity index 81%
rename from scripts/ci/docker-compose/integration-presto.yml
rename to scripts/ci/docker-compose/integration-trino.yml
index 7fce206..3f420fb 100644
--- a/scripts/ci/docker-compose/integration-presto.yml
+++ b/scripts/ci/docker-compose/integration-trino.yml
@@ -17,10 +17,10 @@
 ---
 version: "2.2"
 services:
-  presto:
-    image: apache/airflow:presto-2020.10.08
-    container_name: presto
-    hostname: presto
+  trino:
+    image: apache/airflow:trino-2021.04.04
+    container_name: trino
+    hostname: trino
     domainname: example.com
 
     networks:
@@ -40,19 +40,19 @@ services:
     volumes:
       - /dev/urandom:/dev/random   # Required to get non-blocking entropy source
       - ../dockerfiles/krb5-kdc-server/krb5.conf:/etc/krb5.conf:ro
-      - presto-db-volume:/data/presto
-      - kerberos-keytabs:/home/presto/kerberos-keytabs
+      - trino-db-volume:/data/trino
+      - kerberos-keytabs:/home/trino/kerberos-keytabs
 
     environment:
       - KRB5_CONFIG=/etc/krb5.conf
       - KRB5_TRACE=/dev/stderr
-      - KRB5_KTNAME=/home/presto/kerberos-keytabs/presto.keytab
+      - KRB5_KTNAME=/home/trino/kerberos-keytabs/trino.keytab
   airflow:
     environment:
-      - INTEGRATION_PRESTO=true
+      - INTEGRATION_TRINO=true
     depends_on:
-      presto:
+      trino:
         condition: service_healthy
 
 volumes:
-  presto-db-volume:
+  trino-db-volume:
diff --git a/scripts/ci/dockerfiles/krb5-kdc-server/utils/create_service.sh b/scripts/ci/dockerfiles/krb5-kdc-server/utils/create_service.sh
index 30161a3..c92aeab 100755
--- a/scripts/ci/dockerfiles/krb5-kdc-server/utils/create_service.sh
+++ b/scripts/ci/dockerfiles/krb5-kdc-server/utils/create_service.sh
@@ -29,7 +29,7 @@ Usage: ${CMDNAME} <service_name> <service_type> <keytab_file>
 Creates an account for the service.
 
 The service name is combined with the domain to create an principal name. If your service is named
-\"presto\" a principal \"presto.example.com\" will be created.
+\"trino\" a principal \"trino.example.com\" will be created.
 
 The protocol can have any value, but it must be identical in the server and client configuration.
 For example: HTTP.
diff --git a/scripts/ci/dockerfiles/presto/Dockerfile b/scripts/ci/dockerfiles/trino/Dockerfile
similarity index 78%
rename from scripts/ci/dockerfiles/presto/Dockerfile
rename to scripts/ci/dockerfiles/trino/Dockerfile
index 80ccbfd..080491f 100644
--- a/scripts/ci/dockerfiles/presto/Dockerfile
+++ b/scripts/ci/dockerfiles/trino/Dockerfile
@@ -14,8 +14,8 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-ARG PRESTO_VERSION="330"
-FROM prestosql/presto:${PRESTO_VERSION}
+ARG TRINO_VERSION="354"
+FROM trinodb/trino:${TRINO_VERSION}
 
 # Obtain root privileges
 USER 0
@@ -23,16 +23,16 @@ USER 0
 # Setup entrypoint
 COPY entrypoint.sh /entrypoint.sh
 ENTRYPOINT ["/entrypoint.sh"]
-CMD ["/usr/lib/presto/bin/run-presto"]
+CMD ["/usr/lib/trino/bin/run-trino"]
 
 # Expose HTTPS
 EXPOSE 7778
 
-LABEL org.apache.airflow.component="presto"
-LABEL org.apache.airflow.presto.core.version="${PRESTO_VERSION}"
-LABEL org.apache.airflow.airflow_bats.version="${AIRFLOW_PRESTO_VERSION}"
+LABEL org.apache.airflow.component="trino"
+LABEL org.apache.airflow.trino.core.version="${TRINO_VERSION}"
+LABEL org.apache.airflow.airflow_trino.version="${AIRFLOW_TRINO_VERSION}"
 LABEL org.apache.airflow.commit_sha="${COMMIT_SHA}"
 LABEL maintainer="Apache Airflow Community <de...@airflow.apache.org>"
 
 # Restore user
-USER presto:presto
+USER trino:trino
diff --git a/scripts/ci/dockerfiles/presto/build_and_push.sh b/scripts/ci/dockerfiles/trino/build_and_push.sh
similarity index 79%
rename from scripts/ci/dockerfiles/presto/build_and_push.sh
rename to scripts/ci/dockerfiles/trino/build_and_push.sh
index d3cac47..ea8a59d 100755
--- a/scripts/ci/dockerfiles/presto/build_and_push.sh
+++ b/scripts/ci/dockerfiles/trino/build_and_push.sh
@@ -21,24 +21,24 @@ DOCKERHUB_REPO=${DOCKERHUB_REPO:="airflow"}
 readonly DOCKERHUB_USER
 readonly DOCKERHUB_REPO
 
-PRESTO_VERSION="330"
-readonly PRESTO_VERSION
+TRINO_VERSION="354"
+readonly TRINO_VERSION
 
-AIRFLOW_PRESTO_VERSION="2020.10.08"
-readonly AIRFLOW_PRESTO_VERSION
+AIRFLOW_TRINO_VERSION="2021.04.04"
+readonly AIRFLOW_TRINO_VERSION
 
 COMMIT_SHA=$(git rev-parse HEAD)
 readonly COMMIT_SHA
 
 cd "$( dirname "${BASH_SOURCE[0]}" )" || exit 1
 
-TAG="${DOCKERHUB_USER}/${DOCKERHUB_REPO}:presto-${AIRFLOW_PRESTO_VERSION}"
+TAG="${DOCKERHUB_USER}/${DOCKERHUB_REPO}:trino-${AIRFLOW_TRINO_VERSION}"
 readonly TAG
 
 docker build . \
     --pull \
-    --build-arg "PRESTO_VERSION=${PRESTO_VERSION}" \
-    --build-arg "AIRFLOW_PRESTO_VERSION=${AIRFLOW_PRESTO_VERSION}" \
+    --build-arg "TRINO_VERSION=${TRINO_VERSION}" \
+    --build-arg "AIRFLOW_TRINO_VERSION=${AIRFLOW_TRINO_VERSION}" \
     --build-arg "COMMIT_SHA=${COMMIT_SHA}" \
     --tag "${TAG}"
 
diff --git a/scripts/ci/dockerfiles/presto/entrypoint.sh b/scripts/ci/dockerfiles/trino/entrypoint.sh
similarity index 73%
rename from scripts/ci/dockerfiles/presto/entrypoint.sh
rename to scripts/ci/dockerfiles/trino/entrypoint.sh
index 9c8d113..314cc5a 100755
--- a/scripts/ci/dockerfiles/presto/entrypoint.sh
+++ b/scripts/ci/dockerfiles/trino/entrypoint.sh
@@ -32,7 +32,7 @@ function check_service {
         RES=$?
         set -e
         if [[ ${RES} == 0 ]]; then
-            echo  "${COLOR_GREEN}OK.  ${COLOR_RESET}"
+            echo  "OK."
             break
         else
             echo -n "."
@@ -58,27 +58,29 @@ function log() {
   echo -e "\u001b[32m[$(date +'%Y-%m-%dT%H:%M:%S%z')]: $*\u001b[0m"
 }
 
-if [ -f /tmp/presto-initiaalized ]; then
+if [ -f /tmp/trino-initialized ]; then
   exec /bin/sh -c "$@"
 fi
 
-PRESTO_CONFIG_FILE="/usr/lib/presto/default/etc/config.properties"
-JVM_CONFIG_FILE="/usr/lib/presto/default/etc/jvm.config"
+TRINO_CONFIG_FILE="/etc/trino/config.properties"
+JVM_CONFIG_FILE="/etc/trino/jvm.config"
 
 log "Generate self-signed SSL certificate"
 JKS_KEYSTORE_FILE=/tmp/ssl_keystore.jks
-JKS_KEYSTORE_PASS=presto
+JKS_KEYSTORE_PASS=trinodb
+keytool -delete --alias "trino-ssl" -keystore "${JKS_KEYSTORE_FILE}" -storepass "${JKS_KEYSTORE_PASS}" || true
+
 keytool \
     -genkeypair \
-    -alias "presto-ssl" \
+    -alias "trino-ssl" \
     -keyalg RSA \
     -keystore "${JKS_KEYSTORE_FILE}" \
     -validity 10000 \
     -dname "cn=Unknown, ou=Unknown, o=Unknown, c=Unknown"\
     -storepass "${JKS_KEYSTORE_PASS}"
 
-log "Set up SSL in ${PRESTO_CONFIG_FILE}"
-cat << EOF >> "${PRESTO_CONFIG_FILE}"
+log "Set up SSL in ${TRINO_CONFIG_FILE}"
+cat << EOF >> "${TRINO_CONFIG_FILE}"
 http-server.https.enabled=true
 http-server.https.port=7778
 http-server.https.keystore.path=${JKS_KEYSTORE_FILE}
@@ -86,9 +88,18 @@ http-server.https.keystore.key=${JKS_KEYSTORE_PASS}
 node.internal-address-source=FQDN
 EOF
 
+log "Set up memory limits in ${TRINO_CONFIG_FILE}"
+cat << EOF >> "${TRINO_CONFIG_FILE}"
+memory.heap-headroom-per-node=128MB
+query.max-memory-per-node=512MB
+query.max-total-memory-per-node=512MB
+EOF
+
+sed -i  "s/Xmx.*$/Xmx640M/" "${JVM_CONFIG_FILE}"
+
 if [[ -n "${KRB5_CONFIG=}" ]]; then
-    log "Set up Kerberos in ${PRESTO_CONFIG_FILE}"
-    cat << EOF >> "${PRESTO_CONFIG_FILE}"
+    log "Set up Kerberos in ${TRINO_CONFIG_FILE}"
+    cat << EOF >> "${TRINO_CONFIG_FILE}"
 http-server.https.enabled=true
 http-server.https.port=7778
 http-server.https.keystore.path=${JKS_KEYSTORE_FILE}
@@ -103,16 +114,18 @@ EOF
 EOF
 fi
 
-log "Waiting for keytab:${KRB5_KTNAME}"
-check_service "Keytab" "test -f ${KRB5_KTNAME}" 30
+if [[ -n "${KRB5_CONFIG=}" ]]; then
+    log "Waiting for keytab:${KRB5_KTNAME}"
+    check_service "Keytab" "test -f ${KRB5_KTNAME}" 30
+fi
 
-touch /tmp/presto-initiaalized
+touch /tmp/trino-initialized
 
 echo "Config: ${JVM_CONFIG_FILE}"
 cat "${JVM_CONFIG_FILE}"
 
-echo "Config: ${PRESTO_CONFIG_FILE}"
-cat "${PRESTO_CONFIG_FILE}"
+echo "Config: ${TRINO_CONFIG_FILE}"
+cat "${TRINO_CONFIG_FILE}"
 
 log "Executing cmd: ${*}"
 exec /bin/sh -c "${@}"
diff --git a/scripts/ci/libraries/_initialization.sh b/scripts/ci/libraries/_initialization.sh
index f82cb55..1098e4a 100644
--- a/scripts/ci/libraries/_initialization.sh
+++ b/scripts/ci/libraries/_initialization.sh
@@ -179,7 +179,7 @@ function initialization::initialize_dockerhub_variables() {
 
 # Determine available integrations
 function initialization::initialize_available_integrations() {
-    export AVAILABLE_INTEGRATIONS="cassandra kerberos mongo openldap pinot presto rabbitmq redis statsd"
+    export AVAILABLE_INTEGRATIONS="cassandra kerberos mongo openldap pinot rabbitmq redis statsd trino"
 }
 
 # Needs to be declared outside of function for MacOS
diff --git a/scripts/in_container/check_environment.sh b/scripts/in_container/check_environment.sh
index 801477e..22c6fe5 100755
--- a/scripts/in_container/check_environment.sh
+++ b/scripts/in_container/check_environment.sh
@@ -160,17 +160,17 @@ check_integration "MongoDB" "mongo" "run_nc mongo 27017" 50
 check_integration "Redis" "redis" "run_nc redis 6379" 50
 check_integration "Cassandra" "cassandra" "run_nc cassandra 9042" 50
 check_integration "OpenLDAP" "openldap" "run_nc openldap 389" 50
-check_integration "Presto (HTTP)" "presto" "run_nc presto 8080" 50
-check_integration "Presto (HTTPS)" "presto" "run_nc presto 7778" 50
-check_integration "Presto (API)" "presto" \
-    "curl --max-time 1 http://presto:8080/v1/info/ | grep '\"starting\":false'" 50
+check_integration "Trino (HTTP)" "trino" "run_nc trino 8080" 50
+check_integration "Trino (HTTPS)" "trino" "run_nc trino 7778" 50
+check_integration "Trino (API)" "trino" \
+    "curl --max-time 1 http://trino:8080/v1/info/ | grep '\"starting\":false'" 50
 check_integration "Pinot (HTTP)" "pinot" "run_nc pinot 9000" 50
 CMD="curl --max-time 1 -X GET 'http://pinot:9000/health' -H 'accept: text/plain' | grep OK"
-check_integration "Presto (Controller API)" "pinot" "${CMD}" 50
+check_integration "Pinot (Controller API)" "pinot" "${CMD}" 50
 CMD="curl --max-time 1 -X GET 'http://pinot:9000/pinot-controller/admin' -H 'accept: text/plain' | grep GOOD"
-check_integration "Presto (Controller API)" "pinot" "${CMD}" 50
+check_integration "Pinot (Controller API)" "pinot" "${CMD}" 50
 CMD="curl --max-time 1 -X GET 'http://pinot:8000/health' -H 'accept: text/plain' | grep OK"
-check_integration "Presto (Broker API)" "pinot" "${CMD}" 50
+check_integration "Pinot (Broker API)" "pinot" "${CMD}" 50
 check_integration "RabbitMQ" "rabbitmq" "run_nc rabbitmq 5672" 50
 
 echo "-----------------------------------------------------------------------------------------------"
diff --git a/scripts/in_container/run_install_and_test_provider_packages.sh b/scripts/in_container/run_install_and_test_provider_packages.sh
index ebd1f77..f6d31b6 100755
--- a/scripts/in_container/run_install_and_test_provider_packages.sh
+++ b/scripts/in_container/run_install_and_test_provider_packages.sh
@@ -95,7 +95,7 @@ function discover_all_provider_packages() {
     # Columns is to force it wider, so it doesn't wrap at 80 characters
     COLUMNS=180 airflow providers list
 
-    local expected_number_of_providers=64
+    local expected_number_of_providers=66
     local actual_number_of_providers
     actual_providers=$(airflow providers list --output yaml | grep package_name)
     actual_number_of_providers=$(wc -l <<<"$actual_providers")
@@ -118,7 +118,7 @@ function discover_all_hooks() {
     group_start "Listing available hooks via 'airflow providers hooks'"
     COLUMNS=180 airflow providers hooks
 
-    local expected_number_of_hooks=60
+    local expected_number_of_hooks=63
     local actual_number_of_hooks
     actual_number_of_hooks=$(airflow providers hooks --output table | grep -c "| apache" | xargs)
     if [[ ${actual_number_of_hooks} != "${expected_number_of_hooks}" ]]; then
@@ -157,7 +157,7 @@ function discover_all_connection_form_widgets() {
 
     COLUMNS=180 airflow providers widgets
 
-    local expected_number_of_widgets=19
+    local expected_number_of_widgets=25
     local actual_number_of_widgets
     actual_number_of_widgets=$(airflow providers widgets --output table | grep -c ^extra)
     if [[ ${actual_number_of_widgets} != "${expected_number_of_widgets}" ]]; then
@@ -176,7 +176,7 @@ function discover_all_field_behaviours() {
     group_start "Listing connections with custom behaviours via 'airflow providers behaviours'"
     COLUMNS=180 airflow providers behaviours
 
-    local expected_number_of_connections_with_behaviours=11
+    local expected_number_of_connections_with_behaviours=12
     local actual_number_of_connections_with_behaviours
     actual_number_of_connections_with_behaviours=$(airflow providers behaviours --output table | grep -v "===" | \
         grep -v field_behaviours | grep -cv "^ " | xargs)
diff --git a/setup.py b/setup.py
index 51bd9be..9ccd60e 100644
--- a/setup.py
+++ b/setup.py
@@ -454,6 +454,7 @@ tableau = [
 telegram = [
     'python-telegram-bot==13.0',
 ]
+trino = ['trino']
 vertica = [
     'vertica-python>=0.5.1',
 ]
@@ -583,6 +584,7 @@ PROVIDERS_REQUIREMENTS: Dict[str, List[str]] = {
     'ssh': ssh,
     'tableau': tableau,
     'telegram': telegram,
+    'trino': trino,
     'vertica': vertica,
     'yandex': yandex,
     'zendesk': zendesk,
@@ -717,6 +719,7 @@ ALL_DB_PROVIDERS = [
     'neo4j',
     'postgres',
     'presto',
+    'trino',
     'vertica',
 ]
 
@@ -932,7 +935,9 @@ def add_all_provider_packages() -> None:
     add_provider_packages_to_extra_requirements("devel_ci", ALL_PROVIDERS)
     add_provider_packages_to_extra_requirements("devel_all", ALL_PROVIDERS)
     add_provider_packages_to_extra_requirements("all_dbs", ALL_DB_PROVIDERS)
-    add_provider_packages_to_extra_requirements("devel_hadoop", ["apache.hdfs", "apache.hive", "presto"])
+    add_provider_packages_to_extra_requirements(
+        "devel_hadoop", ["apache.hdfs", "apache.hive", "presto", "trino"]
+    )
 
 
 class Develop(develop_orig):
diff --git a/tests/cli/commands/test_connection_command.py b/tests/cli/commands/test_connection_command.py
index ae78892..b974e15 100644
--- a/tests/cli/commands/test_connection_command.py
+++ b/tests/cli/commands/test_connection_command.py
@@ -101,6 +101,10 @@ class TestCliListConnections(unittest.TestCase):
             'sqlite',
         ),
         (
+            'trino_default',
+            'trino',
+        ),
+        (
             'vertica_default',
             'vertica',
         ),
diff --git a/tests/conftest.py b/tests/conftest.py
index ca8c44b..de7e903 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -128,7 +128,7 @@ def pytest_addoption(parser):
         action="append",
         metavar="INTEGRATIONS",
         help="only run tests matching integration specified: "
-        "[cassandra,kerberos,mongo,openldap,presto,rabbitmq,redis]. ",
+        "[cassandra,kerberos,mongo,openldap,rabbitmq,redis,statsd,trino]. ",
     )
     group.addoption(
         "--backend",
diff --git a/tests/core/test_providers_manager.py b/tests/core/test_providers_manager.py
index 9112d5e..5fd0af4 100644
--- a/tests/core/test_providers_manager.py
+++ b/tests/core/test_providers_manager.py
@@ -83,6 +83,7 @@ ALL_PROVIDERS = [
     'apache-airflow-providers-ssh',
     'apache-airflow-providers-tableau',
     'apache-airflow-providers-telegram',
+    'apache-airflow-providers-trino',
     'apache-airflow-providers-vertica',
     'apache-airflow-providers-yandex',
     'apache-airflow-providers-zendesk',
@@ -146,6 +147,7 @@ CONNECTIONS_LIST = [
     'sqoop',
     'ssh',
     'tableau',
+    'trino',
     'vault',
     'vertica',
     'wasb',
diff --git a/tests/providers/google/cloud/transfers/test_trino_to_gcs.py b/tests/providers/google/cloud/transfers/test_trino_to_gcs.py
new file mode 100644
index 0000000..7cb6539
--- /dev/null
+++ b/tests/providers/google/cloud/transfers/test_trino_to_gcs.py
@@ -0,0 +1,331 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import unittest
+from unittest.mock import patch
+
+import pytest
+
+from airflow.providers.google.cloud.transfers.trino_to_gcs import TrinoToGCSOperator
+
+TASK_ID = "test-trino-to-gcs"
+TRINO_CONN_ID = "my-trino-conn"
+GCP_CONN_ID = "my-gcp-conn"
+IMPERSONATION_CHAIN = ["ACCOUNT_1", "ACCOUNT_2", "ACCOUNT_3"]
+SQL = "SELECT * FROM memory.default.test_multiple_types"
+BUCKET = "gs://test"
+FILENAME = "test_{}.ndjson"
+
+NDJSON_LINES = [
+    b'{"some_num": 42, "some_str": "mock_row_content_1"}\n',
+    b'{"some_num": 43, "some_str": "mock_row_content_2"}\n',
+    b'{"some_num": 44, "some_str": "mock_row_content_3"}\n',
+]
+CSV_LINES = [
+    b"some_num,some_str\r\n",
+    b"42,mock_row_content_1\r\n",
+    b"43,mock_row_content_2\r\n",
+    b"44,mock_row_content_3\r\n",
+]
+SCHEMA_FILENAME = "schema_test.json"
+SCHEMA_JSON = b'[{"name": "some_num", "type": "INT64"}, {"name": "some_str", "type": "STRING"}]'
+
+
+@pytest.mark.integration("trino")
+class TestTrinoToGCSOperator(unittest.TestCase):
+    def test_init(self):
+        """Test TrinoToGCSOperator instance is properly initialized."""
+        op = TrinoToGCSOperator(
+            task_id=TASK_ID,
+            sql=SQL,
+            bucket=BUCKET,
+            filename=FILENAME,
+            impersonation_chain=IMPERSONATION_CHAIN,
+        )
+        assert op.task_id == TASK_ID
+        assert op.sql == SQL
+        assert op.bucket == BUCKET
+        assert op.filename == FILENAME
+        assert op.impersonation_chain == IMPERSONATION_CHAIN
+
+    @patch("airflow.providers.google.cloud.transfers.trino_to_gcs.TrinoHook")
+    @patch("airflow.providers.google.cloud.transfers.sql_to_gcs.GCSHook")
+    def test_save_as_json(self, mock_gcs_hook, mock_trino_hook):
+        def _assert_upload(bucket, obj, tmp_filename, mime_type, gzip):
+            assert BUCKET == bucket
+            assert FILENAME.format(0) == obj
+            assert "application/json" == mime_type
+            assert not gzip
+            with open(tmp_filename, "rb") as file:
+                assert b"".join(NDJSON_LINES) == file.read()
+
+        mock_gcs_hook.return_value.upload.side_effect = _assert_upload
+
+        mock_cursor = mock_trino_hook.return_value.get_conn.return_value.cursor
+
+        mock_cursor.return_value.description = [
+            ("some_num", "INTEGER", None, None, None, None, None),
+            ("some_str", "VARCHAR", None, None, None, None, None),
+        ]
+
+        mock_cursor.return_value.fetchone.side_effect = [
+            [42, "mock_row_content_1"],
+            [43, "mock_row_content_2"],
+            [44, "mock_row_content_3"],
+            None,
+        ]
+
+        op = TrinoToGCSOperator(
+            task_id=TASK_ID,
+            sql=SQL,
+            bucket=BUCKET,
+            filename=FILENAME,
+            trino_conn_id=TRINO_CONN_ID,
+            gcp_conn_id=GCP_CONN_ID,
+            impersonation_chain=IMPERSONATION_CHAIN,
+        )
+
+        op.execute(None)
+
+        mock_trino_hook.assert_called_once_with(trino_conn_id=TRINO_CONN_ID)
+        mock_gcs_hook.assert_called_once_with(
+            delegate_to=None,
+            gcp_conn_id=GCP_CONN_ID,
+            impersonation_chain=IMPERSONATION_CHAIN,
+        )
+
+        mock_gcs_hook.return_value.upload.assert_called()
+
+    @patch("airflow.providers.google.cloud.transfers.trino_to_gcs.TrinoHook")
+    @patch("airflow.providers.google.cloud.transfers.sql_to_gcs.GCSHook")
+    def test_save_as_json_with_file_splitting(self, mock_gcs_hook, mock_trino_hook):
+        """Test that ndjson is split by approx_max_file_size_bytes param."""
+
+        expected_upload = {
+            FILENAME.format(0): b"".join(NDJSON_LINES[:2]),
+            FILENAME.format(1): NDJSON_LINES[2],
+        }
+
+        def _assert_upload(bucket, obj, tmp_filename, mime_type, gzip):
+            assert BUCKET == bucket
+            assert "application/json" == mime_type
+            assert not gzip
+            with open(tmp_filename, "rb") as file:
+                assert expected_upload[obj] == file.read()
+
+        mock_gcs_hook.return_value.upload.side_effect = _assert_upload
+
+        mock_cursor = mock_trino_hook.return_value.get_conn.return_value.cursor
+
+        mock_cursor.return_value.description = [
+            ("some_num", "INTEGER", None, None, None, None, None),
+            ("some_str", "VARCHAR(20)", None, None, None, None, None),
+        ]
+
+        mock_cursor.return_value.fetchone.side_effect = [
+            [42, "mock_row_content_1"],
+            [43, "mock_row_content_2"],
+            [44, "mock_row_content_3"],
+            None,
+        ]
+
+        op = TrinoToGCSOperator(
+            task_id=TASK_ID,
+            sql=SQL,
+            bucket=BUCKET,
+            filename=FILENAME,
+            approx_max_file_size_bytes=len(expected_upload[FILENAME.format(0)]),
+        )
+
+        op.execute(None)
+
+        mock_gcs_hook.return_value.upload.assert_called()
+
+    @patch("airflow.providers.google.cloud.transfers.trino_to_gcs.TrinoHook")
+    @patch("airflow.providers.google.cloud.transfers.sql_to_gcs.GCSHook")
+    def test_save_as_json_with_schema_file(self, mock_gcs_hook, mock_trino_hook):
+        """Test writing schema files."""
+
+        def _assert_upload(bucket, obj, tmp_filename, mime_type, gzip):  # pylint: disable=unused-argument
+            if obj == SCHEMA_FILENAME:
+                with open(tmp_filename, "rb") as file:
+                    assert SCHEMA_JSON == file.read()
+
+        mock_gcs_hook.return_value.upload.side_effect = _assert_upload
+
+        mock_cursor = mock_trino_hook.return_value.get_conn.return_value.cursor
+
+        mock_cursor.return_value.description = [
+            ("some_num", "INTEGER", None, None, None, None, None),
+            ("some_str", "VARCHAR", None, None, None, None, None),
+        ]
+
+        mock_cursor.return_value.fetchone.side_effect = [
+            [42, "mock_row_content_1"],
+            [43, "mock_row_content_2"],
+            [44, "mock_row_content_3"],
+            None,
+        ]
+
+        op = TrinoToGCSOperator(
+            task_id=TASK_ID,
+            sql=SQL,
+            bucket=BUCKET,
+            filename=FILENAME,
+            schema_filename=SCHEMA_FILENAME,
+            export_format="csv",
+            trino_conn_id=TRINO_CONN_ID,
+            gcp_conn_id=GCP_CONN_ID,
+        )
+        op.execute(None)
+
+        # once for the file and once for the schema
+        assert 2 == mock_gcs_hook.return_value.upload.call_count
+
+    @patch("airflow.providers.google.cloud.transfers.sql_to_gcs.GCSHook")
+    @patch("airflow.providers.google.cloud.transfers.trino_to_gcs.TrinoHook")
+    def test_save_as_csv(self, mock_trino_hook, mock_gcs_hook):
+        def _assert_upload(bucket, obj, tmp_filename, mime_type, gzip):
+            assert BUCKET == bucket
+            assert FILENAME.format(0) == obj
+            assert "text/csv" == mime_type
+            assert not gzip
+            with open(tmp_filename, "rb") as file:
+                assert b"".join(CSV_LINES) == file.read()
+
+        mock_gcs_hook.return_value.upload.side_effect = _assert_upload
+
+        mock_cursor = mock_trino_hook.return_value.get_conn.return_value.cursor
+
+        mock_cursor.return_value.description = [
+            ("some_num", "INTEGER", None, None, None, None, None),
+            ("some_str", "VARCHAR", None, None, None, None, None),
+        ]
+
+        mock_cursor.return_value.fetchone.side_effect = [
+            [42, "mock_row_content_1"],
+            [43, "mock_row_content_2"],
+            [44, "mock_row_content_3"],
+            None,
+        ]
+
+        op = TrinoToGCSOperator(
+            task_id=TASK_ID,
+            sql=SQL,
+            bucket=BUCKET,
+            filename=FILENAME,
+            export_format="csv",
+            trino_conn_id=TRINO_CONN_ID,
+            gcp_conn_id=GCP_CONN_ID,
+            impersonation_chain=IMPERSONATION_CHAIN,
+        )
+
+        op.execute(None)
+
+        mock_gcs_hook.return_value.upload.assert_called()
+
+        mock_trino_hook.assert_called_once_with(trino_conn_id=TRINO_CONN_ID)
+        mock_gcs_hook.assert_called_once_with(
+            delegate_to=None,
+            gcp_conn_id=GCP_CONN_ID,
+            impersonation_chain=IMPERSONATION_CHAIN,
+        )
+
+    @patch("airflow.providers.google.cloud.transfers.trino_to_gcs.TrinoHook")
+    @patch("airflow.providers.google.cloud.transfers.sql_to_gcs.GCSHook")
+    def test_save_as_csv_with_file_splitting(self, mock_gcs_hook, mock_trino_hook):
+        """Test that csv is split by approx_max_file_size_bytes param."""
+
+        expected_upload = {
+            FILENAME.format(0): b"".join(CSV_LINES[:3]),
+            FILENAME.format(1): b"".join([CSV_LINES[0], CSV_LINES[3]]),
+        }
+
+        def _assert_upload(bucket, obj, tmp_filename, mime_type, gzip):
+            assert BUCKET == bucket
+            assert "text/csv" == mime_type
+            assert not gzip
+            with open(tmp_filename, "rb") as file:
+                assert expected_upload[obj] == file.read()
+
+        mock_gcs_hook.return_value.upload.side_effect = _assert_upload
+
+        mock_cursor = mock_trino_hook.return_value.get_conn.return_value.cursor
+
+        mock_cursor.return_value.description = [
+            ("some_num", "INTEGER", None, None, None, None, None),
+            ("some_str", "VARCHAR(20)", None, None, None, None, None),
+        ]
+
+        mock_cursor.return_value.fetchone.side_effect = [
+            [42, "mock_row_content_1"],
+            [43, "mock_row_content_2"],
+            [44, "mock_row_content_3"],
+            None,
+        ]
+
+        op = TrinoToGCSOperator(
+            task_id=TASK_ID,
+            sql=SQL,
+            bucket=BUCKET,
+            filename=FILENAME,
+            approx_max_file_size_bytes=len(expected_upload[FILENAME.format(0)]),
+            export_format="csv",
+        )
+
+        op.execute(None)
+
+        mock_gcs_hook.return_value.upload.assert_called()
+
+    @patch("airflow.providers.google.cloud.transfers.trino_to_gcs.TrinoHook")
+    @patch("airflow.providers.google.cloud.transfers.sql_to_gcs.GCSHook")
+    def test_save_as_csv_with_schema_file(self, mock_gcs_hook, mock_trino_hook):
+        """Test writing schema files."""
+
+        def _assert_upload(bucket, obj, tmp_filename, mime_type, gzip):  # pylint: disable=unused-argument
+            if obj == SCHEMA_FILENAME:
+                with open(tmp_filename, "rb") as file:
+                    assert SCHEMA_JSON == file.read()
+
+        mock_gcs_hook.return_value.upload.side_effect = _assert_upload
+
+        mock_cursor = mock_trino_hook.return_value.get_conn.return_value.cursor
+
+        mock_cursor.return_value.description = [
+            ("some_num", "INTEGER", None, None, None, None, None),
+            ("some_str", "VARCHAR", None, None, None, None, None),
+        ]
+
+        mock_cursor.return_value.fetchone.side_effect = [
+            [42, "mock_row_content_1"],
+            [43, "mock_row_content_2"],
+            [44, "mock_row_content_3"],
+            None,
+        ]
+
+        op = TrinoToGCSOperator(
+            task_id=TASK_ID,
+            sql=SQL,
+            bucket=BUCKET,
+            filename=FILENAME,
+            schema_filename=SCHEMA_FILENAME,
+            export_format="csv",
+        )
+        op.execute(None)
+
+        # once for the file and once for the schema
+        assert 2 == mock_gcs_hook.return_value.upload.call_count
diff --git a/tests/providers/google/cloud/transfers/test_trino_to_gcs_system.py b/tests/providers/google/cloud/transfers/test_trino_to_gcs_system.py
new file mode 100644
index 0000000..00d5716
--- /dev/null
+++ b/tests/providers/google/cloud/transfers/test_trino_to_gcs_system.py
@@ -0,0 +1,169 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import os
+from contextlib import closing, suppress
+
+import pytest
+
+from airflow.models import Connection
+from airflow.providers.trino.hooks.trino import TrinoHook
+from tests.providers.google.cloud.utils.gcp_authenticator import GCP_BIGQUERY_KEY, GCP_GCS_KEY
+from tests.test_utils.gcp_system_helpers import CLOUD_DAG_FOLDER, GoogleSystemTest, provide_gcp_context
+
+try:
+    from airflow.utils.session import create_session
+except ImportError:
+    # This is a hack to import create_session from old destination and
+    # fool the pre-commit check that looks for old imports...
+    # TODO remove this once we don't need to test this on 1.10
+    import importlib
+
+    db_module = importlib.import_module("airflow.utils.db")
+    create_session = getattr(db_module, "create_session")
+
+
+GCS_BUCKET = os.environ.get("GCP_TRINO_TO_GCS_BUCKET_NAME", "test-trino-to-gcs-bucket")
+DATASET_NAME = os.environ.get("GCP_TRINO_TO_GCS_DATASET_NAME", "test_trino_to_gcs_dataset")
+
+CREATE_QUERY = """
+CREATE TABLE memory.default.test_multiple_types (
+  -- Boolean
+  z_boolean BOOLEAN,
+  -- Integers
+  z_tinyint TINYINT,
+  z_smallint SMALLINT,
+  z_integer INTEGER,
+  z_bigint BIGINT,
+  -- Floating-Point
+  z_real REAL,
+  z_double DOUBLE,
+  -- Fixed-Point
+  z_decimal DECIMAL(10,2),
+  -- String
+  z_varchar VARCHAR(20),
+  z_char CHAR(20),
+  z_varbinary VARBINARY,
+  z_json JSON,
+  -- Date and Time
+  z_date DATE,
+  z_time TIME,
+  z_time_with_time_zone TIME WITH TIME ZONE,
+  z_timestamp TIMESTAMP,
+  z_timestamp_with_time_zone TIMESTAMP WITH TIME ZONE,
+  -- Network Address
+  z_ipaddress_v4 IPADDRESS,
+  z_ipaddress_v6 IPADDRESS,
+  -- UUID
+  z_uuid UUID
+)
+"""
+
+LOAD_QUERY = """
+INSERT INTO memory.default.test_multiple_types VALUES(
+  -- Boolean
+  true,                                                    -- z_boolean BOOLEAN,
+  -- Integers
+  CAST(POW(2, 7 ) - 42 AS TINYINT),                        -- z_tinyint TINYINT,
+  CAST(POW(2, 15) - 42 AS SMALLINT),                       -- z_smallint SMALLINT,
+  CAST(POW(2, 31) - 42 AS INTEGER),                        -- z_integer INTEGER,
+  CAST(POW(2, 32) - 42 AS BIGINT) * 2,                     -- z_bigint BIGINT,
+  -- Floating-Point
+  REAL '42',                                               -- z_real REAL,
+  DOUBLE '1.03e42',                                        -- z_double DOUBLE,
+  -- Floating-Point
+  DECIMAL '1.1',                                           -- z_decimal DECIMAL(10, 2),
+  -- String
+  U&'Hello winter \2603 !',                                -- z_vaarchar VARCHAR(20),
+  'cat',                                                   -- z_char CHAR(20),
+  X'65683F',                                               -- z_varbinary VARBINARY,
+  CAST('["A", 1, true]' AS JSON),                          -- z_json JSON,
+  -- Date and Time
+  DATE '2001-08-22',                                       -- z_date DATE,
+  TIME '01:02:03.456',                                     -- z_time TIME,
+  TIME '01:02:03.456 America/Los_Angeles',                 -- z_time_with_time_zone TIME WITH TIME ZONE,
+  TIMESTAMP '2001-08-22 03:04:05.321',                     -- z_timestamp TIMESTAMP,
+  TIMESTAMP '2001-08-22 03:04:05.321 America/Los_Angeles', -- z_timestamp_with_time_zone TIMESTAMP WITH TIME
+                                                           -- ZONE,
+  -- Network Address
+  IPADDRESS '10.0.0.1',                                    -- z_ipaddress_v4 IPADDRESS,
+  IPADDRESS '2001:db8::1',                                 -- z_ipaddress_v6 IPADDRESS,
+  -- UUID
+  UUID '12151fd2-7586-11e9-8f9e-2a86e4085a59'              -- z_uuid UUID
+)
+"""
+DELETE_QUERY = "DROP TABLE memory.default.test_multiple_types"
+
+
+@pytest.mark.integration("trino")
+class TrinoToGCSSystemTest(GoogleSystemTest):
+    @staticmethod
+    def init_connection():
+        with create_session() as session:
+            session.query(Connection).filter(Connection.conn_id == "trino_default").delete()
+            session.merge(
+                Connection(
+                    conn_id="trino_default", conn_type="conn_type", host="trino", port=8080, login="airflow"
+                )
+            )
+
+    @staticmethod
+    def init_db():
+        hook = TrinoHook()
+        with hook.get_conn() as conn:
+            with closing(conn.cursor()) as cur:
+                cur.execute(CREATE_QUERY)
+                # Trino does not execute queries until the result is fetched. :-(
+                cur.fetchone()
+                cur.execute(LOAD_QUERY)
+                cur.fetchone()
+
+    @staticmethod
+    def drop_db():
+        hook = TrinoHook()
+        with hook.get_conn() as conn:
+            with closing(conn.cursor()) as cur:
+                cur.execute(DELETE_QUERY)
+                # Trino does not execute queries until the result is fetched. :-(
+                cur.fetchone()
+
+    @provide_gcp_context(GCP_GCS_KEY)
+    def setUp(self):
+        super().setUp()
+        self.init_connection()
+        self.create_gcs_bucket(GCS_BUCKET)
+        with suppress(Exception):
+            self.drop_db()
+        self.init_db()
+        self.execute_with_ctx(
+            ["bq", "rm", "--recursive", "--force", f"{self._project_id()}:{DATASET_NAME}"],
+            key=GCP_BIGQUERY_KEY,
+        )
+
+    @provide_gcp_context(GCP_BIGQUERY_KEY)
+    def test_run_example_dag(self):
+        self.run_dag("example_trino_to_gcs", CLOUD_DAG_FOLDER)
+
+    @provide_gcp_context(GCP_GCS_KEY)
+    def tearDown(self):
+        self.delete_gcs_bucket(GCS_BUCKET)
+        self.drop_db()
+        self.execute_with_ctx(
+            ["bq", "rm", "--recursive", "--force", f"{self._project_id()}:{DATASET_NAME}"],
+            key=GCP_BIGQUERY_KEY,
+        )
+        super().tearDown()
diff --git a/tests/providers/mysql/transfers/test_trino_to_mysql.py b/tests/providers/mysql/transfers/test_trino_to_mysql.py
new file mode 100644
index 0000000..2e23169
--- /dev/null
+++ b/tests/providers/mysql/transfers/test_trino_to_mysql.py
@@ -0,0 +1,73 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import os
+import unittest
+from unittest.mock import patch
+
+from airflow.providers.mysql.transfers.trino_to_mysql import TrinoToMySqlOperator
+from tests.providers.apache.hive import DEFAULT_DATE, TestHiveEnvironment
+
+
+class TestTrinoToMySqlTransfer(TestHiveEnvironment):
+    def setUp(self):
+        self.kwargs = dict(
+            sql='sql',
+            mysql_table='mysql_table',
+            task_id='test_trino_to_mysql_transfer',
+        )
+        super().setUp()
+
+    @patch('airflow.providers.mysql.transfers.trino_to_mysql.MySqlHook')
+    @patch('airflow.providers.mysql.transfers.trino_to_mysql.TrinoHook')
+    def test_execute(self, mock_trino_hook, mock_mysql_hook):
+        TrinoToMySqlOperator(**self.kwargs).execute(context={})
+
+        mock_trino_hook.return_value.get_records.assert_called_once_with(self.kwargs['sql'])
+        mock_mysql_hook.return_value.insert_rows.assert_called_once_with(
+            table=self.kwargs['mysql_table'], rows=mock_trino_hook.return_value.get_records.return_value
+        )
+
+    @patch('airflow.providers.mysql.transfers.trino_to_mysql.MySqlHook')
+    @patch('airflow.providers.mysql.transfers.trino_to_mysql.TrinoHook')
+    def test_execute_with_mysql_preoperator(self, mock_trino_hook, mock_mysql_hook):
+        self.kwargs.update(dict(mysql_preoperator='mysql_preoperator'))
+
+        TrinoToMySqlOperator(**self.kwargs).execute(context={})
+
+        mock_trino_hook.return_value.get_records.assert_called_once_with(self.kwargs['sql'])
+        mock_mysql_hook.return_value.run.assert_called_once_with(self.kwargs['mysql_preoperator'])
+        mock_mysql_hook.return_value.insert_rows.assert_called_once_with(
+            table=self.kwargs['mysql_table'], rows=mock_trino_hook.return_value.get_records.return_value
+        )
+
+    @unittest.skipIf(
+        'AIRFLOW_RUNALL_TESTS' not in os.environ, "Skipped because AIRFLOW_RUNALL_TESTS is not set"
+    )
+    def test_trino_to_mysql(self):
+        op = TrinoToMySqlOperator(
+            task_id='trino_to_mysql_check',
+            sql="""
+                SELECT name, count(*) as ccount
+                FROM airflow.static_babynames
+                GROUP BY name
+                """,
+            mysql_table='test_static_babynames',
+            mysql_preoperator='TRUNCATE TABLE test_static_babynames;',
+            dag=self.dag,
+        )
+        op.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True)
diff --git a/tests/providers/presto/hooks/test_presto.py b/tests/providers/presto/hooks/test_presto.py
index f9e8587..e6ebb73 100644
--- a/tests/providers/presto/hooks/test_presto.py
+++ b/tests/providers/presto/hooks/test_presto.py
@@ -206,28 +206,3 @@ class TestPrestoHook(unittest.TestCase):
         assert result_sets[1][0] == df.values.tolist()[1][0]
 
         self.cur.execute.assert_called_once_with(statement, None)
-
-
-class TestPrestoHookIntegration(unittest.TestCase):
-    @pytest.mark.integration("presto")
-    @mock.patch.dict('os.environ', AIRFLOW_CONN_PRESTO_DEFAULT="presto://airflow@presto:8080/")
-    def test_should_record_records(self):
-        hook = PrestoHook()
-        sql = "SELECT name FROM tpch.sf1.customer ORDER BY custkey ASC LIMIT 3"
-        records = hook.get_records(sql)
-        assert [['Customer#000000001'], ['Customer#000000002'], ['Customer#000000003']] == records
-
-    @pytest.mark.integration("presto")
-    @pytest.mark.integration("kerberos")
-    def test_should_record_records_with_kerberos_auth(self):
-        conn_url = (
-            'presto://airflow@presto:7778/?'
-            'auth=kerberos&kerberos__service_name=HTTP&'
-            'verify=False&'
-            'protocol=https'
-        )
-        with mock.patch.dict('os.environ', AIRFLOW_CONN_PRESTO_DEFAULT=conn_url):
-            hook = PrestoHook()
-            sql = "SELECT name FROM tpch.sf1.customer ORDER BY custkey ASC LIMIT 3"
-            records = hook.get_records(sql)
-            assert [['Customer#000000001'], ['Customer#000000002'], ['Customer#000000003']] == records
diff --git a/scripts/ci/docker-compose/integration-redis.yml b/tests/providers/trino/__init__.py
similarity index 59%
copy from scripts/ci/docker-compose/integration-redis.yml
copy to tests/providers/trino/__init__.py
index ab353d2..217e5db 100644
--- a/scripts/ci/docker-compose/integration-redis.yml
+++ b/tests/providers/trino/__init__.py
@@ -1,3 +1,4 @@
+#
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
@@ -14,29 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
----
-version: "2.2"
-services:
-  redis:
-    image: redis:5.0.1
-    volumes:
-      - /dev/urandom:/dev/random   # Required to get non-blocking entropy source
-      - redis-db-volume:/data/presto
-    ports:
-      - "${REDIS_HOST_PORT}:6379"
-    healthcheck:
-      test: ["CMD", "redis-cli", "ping"]
-      interval: 5s
-      timeout: 30s
-      retries: 50
-    restart: always
-
-  airflow:
-    environment:
-      - INTEGRATION_REDIS=true
-    depends_on:
-      redis:
-        condition: service_healthy
-
-volumes:
-  redis-db-volume:
diff --git a/scripts/ci/docker-compose/integration-redis.yml b/tests/providers/trino/hooks/__init__.py
similarity index 59%
copy from scripts/ci/docker-compose/integration-redis.yml
copy to tests/providers/trino/hooks/__init__.py
index ab353d2..217e5db 100644
--- a/scripts/ci/docker-compose/integration-redis.yml
+++ b/tests/providers/trino/hooks/__init__.py
@@ -1,3 +1,4 @@
+#
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
@@ -14,29 +15,3 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
----
-version: "2.2"
-services:
-  redis:
-    image: redis:5.0.1
-    volumes:
-      - /dev/urandom:/dev/random   # Required to get non-blocking entropy source
-      - redis-db-volume:/data/presto
-    ports:
-      - "${REDIS_HOST_PORT}:6379"
-    healthcheck:
-      test: ["CMD", "redis-cli", "ping"]
-      interval: 5s
-      timeout: 30s
-      retries: 50
-    restart: always
-
-  airflow:
-    environment:
-      - INTEGRATION_REDIS=true
-    depends_on:
-      redis:
-        condition: service_healthy
-
-volumes:
-  redis-db-volume:
diff --git a/tests/providers/presto/hooks/test_presto.py b/tests/providers/trino/hooks/test_trino.py
similarity index 82%
copy from tests/providers/presto/hooks/test_presto.py
copy to tests/providers/trino/hooks/test_trino.py
index f9e8587..e649d2b 100644
--- a/tests/providers/presto/hooks/test_presto.py
+++ b/tests/providers/trino/hooks/test_trino.py
@@ -24,23 +24,23 @@ from unittest.mock import patch
 
 import pytest
 from parameterized import parameterized
-from prestodb.transaction import IsolationLevel
+from trino.transaction import IsolationLevel
 
 from airflow import AirflowException
 from airflow.models import Connection
-from airflow.providers.presto.hooks.presto import PrestoHook
+from airflow.providers.trino.hooks.trino import TrinoHook
 
 
-class TestPrestoHookConn(unittest.TestCase):
-    @patch('airflow.providers.presto.hooks.presto.prestodb.auth.BasicAuthentication')
-    @patch('airflow.providers.presto.hooks.presto.prestodb.dbapi.connect')
-    @patch('airflow.providers.presto.hooks.presto.PrestoHook.get_connection')
+class TestTrinoHookConn(unittest.TestCase):
+    @patch('airflow.providers.trino.hooks.trino.trino.auth.BasicAuthentication')
+    @patch('airflow.providers.trino.hooks.trino.trino.dbapi.connect')
+    @patch('airflow.providers.trino.hooks.trino.TrinoHook.get_connection')
     def test_get_conn_basic_auth(self, mock_get_connection, mock_connect, mock_basic_auth):
         mock_get_connection.return_value = Connection(
             login='login', password='password', host='host', schema='hive'
         )
 
-        conn = PrestoHook().get_conn()
+        conn = TrinoHook().get_conn()
         mock_connect.assert_called_once_with(
             catalog='hive',
             host='host',
@@ -55,7 +55,7 @@ class TestPrestoHookConn(unittest.TestCase):
         mock_basic_auth.assert_called_once_with('login', 'password')
         assert mock_connect.return_value == conn
 
-    @patch('airflow.providers.presto.hooks.presto.PrestoHook.get_connection')
+    @patch('airflow.providers.trino.hooks.trino.TrinoHook.get_connection')
     def test_get_conn_invalid_auth(self, mock_get_connection):
         mock_get_connection.return_value = Connection(
             login='login',
@@ -67,11 +67,11 @@ class TestPrestoHookConn(unittest.TestCase):
         with pytest.raises(
             AirflowException, match=re.escape("Kerberos authorization doesn't support password.")
         ):
-            PrestoHook().get_conn()
+            TrinoHook().get_conn()
 
-    @patch('airflow.providers.presto.hooks.presto.prestodb.auth.KerberosAuthentication')
-    @patch('airflow.providers.presto.hooks.presto.prestodb.dbapi.connect')
-    @patch('airflow.providers.presto.hooks.presto.PrestoHook.get_connection')
+    @patch('airflow.providers.trino.hooks.trino.trino.auth.KerberosAuthentication')
+    @patch('airflow.providers.trino.hooks.trino.trino.dbapi.connect')
+    @patch('airflow.providers.trino.hooks.trino.TrinoHook.get_connection')
     def test_get_conn_kerberos_auth(self, mock_get_connection, mock_connect, mock_auth):
         mock_get_connection.return_value = Connection(
             login='login',
@@ -93,7 +93,7 @@ class TestPrestoHookConn(unittest.TestCase):
             ),
         )
 
-        conn = PrestoHook().get_conn()
+        conn = TrinoHook().get_conn()
         mock_connect.assert_called_once_with(
             catalog='hive',
             host='host',
@@ -128,8 +128,8 @@ class TestPrestoHookConn(unittest.TestCase):
         ]
     )
     def test_get_conn_verify(self, current_verify, expected_verify):
-        patcher_connect = patch('airflow.providers.presto.hooks.presto.prestodb.dbapi.connect')
-        patcher_get_connections = patch('airflow.providers.presto.hooks.presto.PrestoHook.get_connection')
+        patcher_connect = patch('airflow.providers.trino.hooks.trino.trino.dbapi.connect')
+        patcher_get_connections = patch('airflow.providers.trino.hooks.trino.TrinoHook.get_connection')
 
         with patcher_connect as mock_connect, patcher_get_connections as mock_get_connection:
             mock_get_connection.return_value = Connection(
@@ -138,12 +138,12 @@ class TestPrestoHookConn(unittest.TestCase):
             mock_verify = mock.PropertyMock()
             type(mock_connect.return_value._http_session).verify = mock_verify
 
-            conn = PrestoHook().get_conn()
+            conn = TrinoHook().get_conn()
             mock_verify.assert_called_once_with(expected_verify)
             assert mock_connect.return_value == conn
 
 
-class TestPrestoHook(unittest.TestCase):
+class TestTrinoHook(unittest.TestCase):
     def setUp(self):
         super().setUp()
 
@@ -152,7 +152,7 @@ class TestPrestoHook(unittest.TestCase):
         self.conn.cursor.return_value = self.cur
         conn = self.conn
 
-        class UnitTestPrestoHook(PrestoHook):
+        class UnitTestTrinoHook(TrinoHook):
             conn_name_attr = 'test_conn_id'
 
             def get_conn(self):
@@ -161,7 +161,7 @@ class TestPrestoHook(unittest.TestCase):
             def get_isolation_level(self):
                 return IsolationLevel.READ_COMMITTED
 
-        self.db_hook = UnitTestPrestoHook()
+        self.db_hook = UnitTestTrinoHook()
 
     @patch('airflow.hooks.dbapi.DbApiHook.insert_rows')
     def test_insert_rows(self, mock_insert_rows):
@@ -208,26 +208,26 @@ class TestPrestoHook(unittest.TestCase):
         self.cur.execute.assert_called_once_with(statement, None)
 
 
-class TestPrestoHookIntegration(unittest.TestCase):
-    @pytest.mark.integration("presto")
-    @mock.patch.dict('os.environ', AIRFLOW_CONN_PRESTO_DEFAULT="presto://airflow@presto:8080/")
+class TestTrinoHookIntegration(unittest.TestCase):
+    @pytest.mark.integration("trino")
+    @mock.patch.dict('os.environ', AIRFLOW_CONN_TRINO_DEFAULT="trino://airflow@trino:8080/")
     def test_should_record_records(self):
-        hook = PrestoHook()
+        hook = TrinoHook()
         sql = "SELECT name FROM tpch.sf1.customer ORDER BY custkey ASC LIMIT 3"
         records = hook.get_records(sql)
         assert [['Customer#000000001'], ['Customer#000000002'], ['Customer#000000003']] == records
 
-    @pytest.mark.integration("presto")
+    @pytest.mark.integration("trino")
     @pytest.mark.integration("kerberos")
     def test_should_record_records_with_kerberos_auth(self):
         conn_url = (
-            'presto://airflow@presto:7778/?'
+            'trino://airflow@trino.example.com:7778/?'
             'auth=kerberos&kerberos__service_name=HTTP&'
             'verify=False&'
             'protocol=https'
         )
-        with mock.patch.dict('os.environ', AIRFLOW_CONN_PRESTO_DEFAULT=conn_url):
-            hook = PrestoHook()
+        with mock.patch.dict('os.environ', AIRFLOW_CONN_TRINO_DEFAULT=conn_url):
+            hook = TrinoHook()
             sql = "SELECT name FROM tpch.sf1.customer ORDER BY custkey ASC LIMIT 3"
             records = hook.get_records(sql)
             assert [['Customer#000000001'], ['Customer#000000002'], ['Customer#000000003']] == records

[airflow] 04/36: Removes unused CI feature of printing output on error (#15190)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 1c046423a9e507e861030f919344414facaff2b1
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Sun Apr 4 22:20:11 2021 +0200

    Removes unused CI feature of printing output on error (#15190)
    
    Fixes: #13924
    (cherry picked from commit 7c17bf0d1e828b454a6b2c7245ded275b313c792)
---
 scripts/in_container/_in_container_utils.sh | 30 +++--------------------------
 1 file changed, 3 insertions(+), 27 deletions(-)

diff --git a/scripts/in_container/_in_container_utils.sh b/scripts/in_container/_in_container_utils.sh
index 7d80f00..ad3083e 100644
--- a/scripts/in_container/_in_container_utils.sh
+++ b/scripts/in_container/_in_container_utils.sh
@@ -54,16 +54,6 @@ function assert_in_container() {
 }
 
 function in_container_script_start() {
-    OUTPUT_PRINTED_ONLY_ON_ERROR=$(mktemp)
-    export OUTPUT_PRINTED_ONLY_ON_ERROR
-    readonly OUTPUT_PRINTED_ONLY_ON_ERROR
-
-    if [[ ${VERBOSE=} == "true" && ${GITHUB_ACTIONS=} != "true" ]]; then
-        echo
-        echo "Output is redirected to ${OUTPUT_PRINTED_ONLY_ON_ERROR} and will be printed on error only"
-        echo
-    fi
-
     if [[ ${VERBOSE_COMMANDS:="false"} == "true" ]]; then
         set -x
     fi
@@ -74,23 +64,9 @@ function in_container_script_end() {
     EXIT_CODE=$?
     if [[ ${EXIT_CODE} != 0 ]]; then
         if [[ "${PRINT_INFO_FROM_SCRIPTS="true"}" == "true" ]]; then
-            if [[ -f "${OUTPUT_PRINTED_ONLY_ON_ERROR}" ]]; then
-                echo "###########################################################################################"
-                echo
-                echo "${COLOR_BLUE} EXIT CODE: ${EXIT_CODE} in container (See above for error message). Below is the output of the last action! ${COLOR_RESET}"
-                echo
-                echo "${COLOR_BLUE}***  BEGINNING OF THE LAST COMMAND OUTPUT *** ${COLOR_RESET}"
-                cat "${OUTPUT_PRINTED_ONLY_ON_ERROR}"
-                echo "${COLOR_BLUE}***  END OF THE LAST COMMAND OUTPUT ***  ${COLOR_RESET}"
-                echo
-                echo "${COLOR_BLUE} EXIT CODE: ${EXIT_CODE} in container. The actual error might be above the output!  ${COLOR_RESET}"
-                echo
-                echo "###########################################################################################"
-            else
-                echo "########################################################################################################################"
-                echo "${COLOR_BLUE} [IN CONTAINER]   EXITING ${0} WITH EXIT CODE ${EXIT_CODE}  ${COLOR_RESET}"
-                echo "########################################################################################################################"
-            fi
+            echo "########################################################################################################################"
+            echo "${COLOR_BLUE} [IN CONTAINER]   EXITING ${0} WITH EXIT CODE ${EXIT_CODE}  ${COLOR_RESET}"
+            echo "########################################################################################################################"
         fi
     fi
 

[airflow] 28/36: Unable to trigger backfill or manual jobs with Kubernetes executor. (#14160)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 9b4b3563c37d8eced204270e0d7fbf56348cc313
Author: Max Taggart <ma...@gmail.com>
AuthorDate: Thu Feb 25 12:16:55 2021 -0700

    Unable to trigger backfill or manual jobs with Kubernetes executor. (#14160)
    
    closes: #13805
    (cherry picked from commit 2b5d4e3ff3c61ea6074caa300bbb8d16027408a6)
---
 airflow/jobs/backfill_job.py    |  1 +
 airflow/www/views.py            |  1 +
 tests/jobs/test_backfill_job.py | 17 +++++++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/airflow/jobs/backfill_job.py b/airflow/jobs/backfill_job.py
index 0d3d057..a16f261 100644
--- a/airflow/jobs/backfill_job.py
+++ b/airflow/jobs/backfill_job.py
@@ -785,6 +785,7 @@ class BackfillJob(BaseJob):
             pickle_id = pickle.id
 
         executor = self.executor
+        executor.job_id = "backfill"
         executor.start()
 
         ti_status.total_runs = len(run_dates)  # total dag runs in backfill
diff --git a/airflow/www/views.py b/airflow/www/views.py
index 5f4c8c5..a38b9cc 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -1377,6 +1377,7 @@ class Airflow(AirflowBaseView):  # noqa: D101  pylint: disable=too-many-public-m
             )
             return redirect(origin)
 
+        executor.job_id = "manual"
         executor.start()
         executor.queue_task_instance(
             ti,
diff --git a/tests/jobs/test_backfill_job.py b/tests/jobs/test_backfill_job.py
index 3139f36..c6f620a 100644
--- a/tests/jobs/test_backfill_job.py
+++ b/tests/jobs/test_backfill_job.py
@@ -1517,3 +1517,20 @@ class TestBackfillJob(unittest.TestCase):
         job.run()
         dr: DagRun = dag.get_last_dagrun()
         assert dr.creating_job_id == job.id
+
+    def test_backfill_has_job_id(self):
+        """Make sure that backfill jobs are assigned job_ids."""
+        dag = self.dagbag.get_dag("test_start_date_scheduling")
+        dag.clear()
+
+        executor = MockExecutor(parallelism=16)
+
+        job = BackfillJob(
+            executor=executor,
+            dag=dag,
+            start_date=DEFAULT_DATE,
+            end_date=DEFAULT_DATE + datetime.timedelta(days=1),
+            run_backwards=True,
+        )
+        job.run()
+        assert executor.job_id is not None

[airflow] 22/36: Add new Committers to docs (#15235)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit ab570a0f4a06cc19ef56736472b5df9ba88e2141
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Tue Apr 6 20:42:20 2021 +0100

    Add new Committers to docs (#15235)
    
    Announcement Link: https://lists.apache.org/thread.html/rcc95b5e04b14d971567369626eb72411140a31404094582b2769c992%40%3Cdev.airflow.apache.org%3E
    
    (cherry picked from commit 12d8e4b62aa03fe54cf98e88bf63b205f4faf390)
---
 .github/workflows/ci.yml        | 4 +++-
 docs/apache-airflow/project.rst | 2 ++
 docs/spelling_wordlist.txt      | 6 ++++++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 243557f..edd499c 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -117,7 +117,9 @@ jobs:
             "zhongjiajie",
             "ephraimbuddy",
             "jhtimmins",
-            "dstandish"
+            "dstandish",
+            "xinbinhuang",
+            "yuqian"
           ]'), github.actor)
         ) && github.repository == 'apache/airflow'
       ) && 'self-hosted' || 'ubuntu-20.04' }}
diff --git a/docs/apache-airflow/project.rst b/docs/apache-airflow/project.rst
index bd5bb37..0a97e0f 100644
--- a/docs/apache-airflow/project.rst
+++ b/docs/apache-airflow/project.rst
@@ -65,6 +65,7 @@ Committers
 - Kevin Yang (@KevinYang21)
 - Leah Cole (@leahecole)
 - Maxime "Max" Beauchemin (@mistercrunch)
+- Qian Yu (@yuqian90)
 - Qingping Hou (@houqp)
 - Ry Walker (@ryw)
 - Ryan Hamilton (@ryanahamilton)
@@ -74,6 +75,7 @@ Committers
 - Tomasz Urbaszek (@turbaszek)
 - Vikram Koka (@vikramkoka)
 - Xiaodong Deng (@XD-DENG)
+- Xinbin Huang (@xinbinhuang)
 
 For the full list of contributors, take a look at `Airflow's GitHub
 Contributor page:
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
index 784c1bd..23a3681 100644
--- a/docs/spelling_wordlist.txt
+++ b/docs/spelling_wordlist.txt
@@ -183,6 +183,7 @@ Hou
 Http
 HttpError
 HttpRequest
+Huang
 IdP
 ImageAnnotatorClient
 Imap
@@ -300,6 +301,7 @@ Pyarrow
 Pylint
 Pyspark
 PythonOperator
+Qian
 Qingping
 Qplum
 Quantopian
@@ -409,9 +411,11 @@ XComs
 Xcom
 Xero
 Xiaodong
+Xinbin
 Yamllint
 Yandex
 Yieldr
+Yu
 Zego
 Zendesk
 Zhong
@@ -1416,6 +1420,7 @@ www
 xcom
 xcomarg
 xcomresult
+xinbinhuang
 xml
 xpath
 xyz
@@ -1426,6 +1431,7 @@ yarnpkg
 yml
 youtrack
 youtube
+yuqian
 zA
 zendesk
 zhongjiajie

[airflow] 14/36: Better compatibility/diagnostics for arbitrary UID in docker image (#15162)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 6860da472bc63acf4bbc85b7d7eaeba5d502ba84
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Thu Apr 8 19:28:36 2021 +0200

    Better compatibility/diagnostics for arbitrary UID in docker image (#15162)
    
    The PROD image of airflow is OpenShift compatible and it can be
    run with either 'airflow' user (UID=50000) or with any other
    user with (GID=0).
    
    This change adds umask 0002 to make sure that whenever the image
    is extended and new directories get created, the directories are
    group-writeable for GID=0. This is added in the default
    entrypoint.
    
    The entrypoint will fail if it is not run as airflow user or if
    other, arbitrary user is used with GID != 0.
    
    Fixes: #15107
    (cherry picked from commit ce91872eccceb8fb6277012a909ad6b529a071d2)
---
 Dockerfile                                         |  2 +-
 chart/values.yaml                                  |  2 +-
 docs/docker-stack/build-arg-ref.rst                |  8 ++--
 docs/docker-stack/build.rst                        | 35 ++++++++++++++--
 .../extending/writable-directory/Dockerfile        | 21 ++++++++++
 docs/docker-stack/entrypoint.rst                   | 46 +++++++++++++++++++---
 scripts/in_container/prod/entrypoint_prod.sh       | 42 +++++++++++++++++++-
 7 files changed, 141 insertions(+), 15 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index 3928057..2a05964 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -487,7 +487,7 @@ WORKDIR ${AIRFLOW_HOME}
 
 EXPOSE 8080
 
-RUN usermod -g 0 airflow
+RUN usermod -g 0 airflow -G ${AIRFLOW_GID}
 
 USER ${AIRFLOW_UID}
 
diff --git a/chart/values.yaml b/chart/values.yaml
index cbced4f..8516980 100644
--- a/chart/values.yaml
+++ b/chart/values.yaml
@@ -21,7 +21,7 @@
 
 # User and group of airflow user
 uid: 50000
-gid: 50000
+gid: 0
 
 # Airflow home directory
 # Used for mount paths
diff --git a/docs/docker-stack/build-arg-ref.rst b/docs/docker-stack/build-arg-ref.rst
index 2ec04c8..f459cb7 100644
--- a/docs/docker-stack/build-arg-ref.rst
+++ b/docs/docker-stack/build-arg-ref.rst
@@ -51,10 +51,10 @@ Those are the most common arguments that you use when you want to build a custom
 +------------------------------------------+------------------------------------------+------------------------------------------+
 | ``AIRFLOW_UID``                          | ``50000``                                | Airflow user UID.                        |
 +------------------------------------------+------------------------------------------+------------------------------------------+
-| ``AIRFLOW_GID``                          | ``50000``                                | Airflow group GID. Note that most files  |
-|                                          |                                          | created on behalf of airflow user belong |
-|                                          |                                          | to the ``root`` group (0) to keep        |
-|                                          |                                          | OpenShift Guidelines compatibility.      |
+| ``AIRFLOW_GID``                          | ``50000``                                | Airflow group GID. Note that writable    |
+|                                          |                                          | files/dirs, created on behalf of airflow |
+|                                          |                                          | user are set to the ``root`` group (0)   |
+|                                          |                                          | to allow arbitrary UID to run the image. |
 +------------------------------------------+------------------------------------------+------------------------------------------+
 | ``AIRFLOW_CONSTRAINTS_REFERENCE``        |                                          | Reference (branch or tag) from GitHub    |
 |                                          |                                          | where constraints file is taken from     |
diff --git a/docs/docker-stack/build.rst b/docs/docker-stack/build.rst
index a07a837..5fa0a59 100644
--- a/docs/docker-stack/build.rst
+++ b/docs/docker-stack/build.rst
@@ -89,6 +89,11 @@ You should be aware, about a few things:
   PIP packages are installed to ``~/.local`` folder as if the ``--user`` flag was specified when running PIP.
   Note also that using ``--no-cache-dir`` is a good idea that can help to make your image smaller.
 
+.. note::
+  Only as of ``2.0.1`` image the ``--user`` flag is turned on by default by setting ``PIP_USER`` environment
+  variable to ``true``. This can be disabled by un-setting the variable or by setting it to ``false``. In the
+  2.0.0 image you had to add the ``--user`` flag as ``pip install --user`` command.
+
 * If your apt, or PyPI dependencies require some of the ``build-essential`` or other packages that need
   to compile your python dependencies, then your best choice is to follow the "Customize the image" route,
   because you can build a highly-optimized (for size) image this way. However it requires to checkout sources
@@ -103,10 +108,22 @@ You should be aware, about a few things:
   a command ``docker build . --tag my-image:my-tag`` (where ``my-image`` is the name you want to name it
   and ``my-tag`` is the tag you want to tag the image with.
 
+* If your way of extending image requires to create writable directories, you MUST remember about adding
+  ``umask 0002`` step in your RUN command. This is necessary in order to accommodate our approach for
+  running the image with an arbitrary user. Such user will always run with ``GID=0`` -
+  the entrypoint will prevent non-root GIDs. You can read more about it in
+  :ref:`arbitrary docker user <arbitrary-docker-user>` documentation for the entrypoint. The
+  ``umask 0002`` is set as default when you enter the image, so any directories you create by default
+  in runtime, will have ``GID=0`` and will be group-writable.
+
 .. note::
-  As of 2.0.1 image the ``--user`` flag is turned on by default by setting ``PIP_USER`` environment variable
-  to ``true``. This can be disabled by un-setting the variable or by setting it to ``false``. In the
-  2.0.0 image you had to add the ``--user`` flag as ``pip install --user`` command.
+  Only as of ``2.0.2`` the default group of ``airflow`` user is ``root``. Previously it was ``airflow``,
+  so if you are building your images based on an earlier image, you need to manually change the default
+  group for airflow user:
+
+.. code-block:: docker
+
+    RUN usermod -g 0 airflow
 
 Examples of image extending
 ---------------------------
@@ -131,6 +148,18 @@ The following example adds ``lxml`` python package from PyPI to the image.
     :start-after: [START Dockerfile]
     :end-before: [END Dockerfile]
 
+A ``umask`` requiring example
+.............................
+
+The following example adds a new directory that is supposed to be writable for any arbitrary user
+running the container.
+
+.. exampleinclude:: docker-examples/extending/writable-directory/Dockerfile
+    :language: Dockerfile
+    :start-after: [START Dockerfile]
+    :end-before: [END Dockerfile]
+
+
 A ``build-essential`` requiring package example
 ...............................................
 
diff --git a/docs/docker-stack/docker-examples/extending/writable-directory/Dockerfile b/docs/docker-stack/docker-examples/extending/writable-directory/Dockerfile
new file mode 100644
index 0000000..76c6535
--- /dev/null
+++ b/docs/docker-stack/docker-examples/extending/writable-directory/Dockerfile
@@ -0,0 +1,21 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This is an example Dockerfile. It is not intended for PRODUCTION use
+# [START Dockerfile]
+FROM apache/airflow:2.0.1
+RUN umask 0002; \
+    mkdir -p ~/writeable-directory
+# [END Dockerfile]
diff --git a/docs/docker-stack/entrypoint.rst b/docs/docker-stack/entrypoint.rst
index 829b37e..cc89872 100644
--- a/docs/docker-stack/entrypoint.rst
+++ b/docs/docker-stack/entrypoint.rst
@@ -87,13 +87,49 @@ The image entrypoint works as follows:
   command to execute and result of this evaluation is used as ``AIRFLOW__CELERY__BROKER_URL``. The
   ``_CMD`` variable takes precedence over the ``AIRFLOW__CELERY__BROKER_URL`` variable.
 
-Creating system user
---------------------
+.. _arbitrary-docker-user:
+
+Allowing arbitrary user to run the container
+--------------------------------------------
+
+Airflow image is Open-Shift compatible, which means that you can start it with random user ID and the
+group id ``0`` (``root``). If you want to run the image with user different than Airflow, you MUST set
+GID of the user to ``0``. In case you try to use different group, the entrypoint exits with error.
+
+In order to accommodate a number of external libraries and projects, Airflow will automatically create
+such an arbitrary user in (`/etc/passwd`) and make it's home directory point to ``/home/airflow``.
+Many of 3rd-party libraries and packages require home directory of the user to be present, because they
+need to write some cache information there, so such a dynamic creation of a user is necessary.
+
+Such arbitrary user has to be able to write to certain directories that needs write access, and since
+it is not advised to allow write access to "other" for security reasons, the OpenShift
+guidelines introduced the concept of making all such folders have the ``0`` (``root``) group id (GID).
+All the directories that need write access in the Airflow production image have GID set to 0 (and
+they are writable for the group). We are following that concept and all the directories that need
+write access follow that.
+
+The GID=0 is set as default for the ``airflow`` user, so any directories it creates have GID set to 0
+by default. The entrypoint sets ``umask`` to be ``0002`` - this means that any directories created by
+the user have also "group write" access for group ``0`` - they will be writable by other users with
+``root`` group. Also whenever any "arbitrary" user creates a folder (for example in a mounted volume), that
+folder will have a "group write" access and ``GID=0``, so that execution with another, arbitrary user
+will still continue to work, even if such directory is mounted by another arbitrary user later.
+
+The ``umask`` setting however only works for runtime of the container - it is not used during building of
+the image. If you would like to extend the image and add your own packages, you should remember to add
+``umask 0002`` in front of your docker command - this way the directories created by any installation
+that need group access will also be writable for the group. This can be done for example this way:
+
+  .. code-block:: docker
+
+      RUN umask 0002; \
+          do_something; \
+          do_otherthing;
+
 
-Airflow image is Open-Shift compatible, which means that you can start it with random user ID and group id 0.
-Airflow will automatically create such a user and make it's home directory point to ``/home/airflow``.
 You can read more about it in the "Support arbitrary user ids" chapter in the
-`Openshift best practices <https://docs.openshift.com/container-platform/4.1/openshift_images/create-images.html#images-create-guide-openshift_create-images>`_.
+`Openshift best practices <https://docs.openshift.com/container-platform/4.7/openshift_images/create-images.html#images-create-guide-openshift_create-images>`_.
+
 
 Waits for Airflow DB connection
 -------------------------------
diff --git a/scripts/in_container/prod/entrypoint_prod.sh b/scripts/in_container/prod/entrypoint_prod.sh
index 12d18e8..4ca8a75 100755
--- a/scripts/in_container/prod/entrypoint_prod.sh
+++ b/scripts/in_container/prod/entrypoint_prod.sh
@@ -15,7 +15,6 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
 # Might be empty
 AIRFLOW_COMMAND="${1}"
 
@@ -244,6 +243,47 @@ function exec_to_bash_or_python_command_if_specified() {
     fi
 }
 
+function check_uid_gid() {
+    if [[ $(id -g) == "0" ]]; then
+        return
+    fi
+    if [[ $(id -u) == "50000" ]]; then
+        >&2 echo
+        >&2 echo "WARNING! You should run the image with GID (Group ID) set to 0"
+        >&2 echo "         even if you use 'airflow' user (UID=50000)"
+        >&2 echo
+        >&2 echo " You started the image with UID=$(id -u) and GID=$(id -g)"
+        >&2 echo
+        >&2 echo " This is to make sure you can run the image with an arbitrary UID in the future."
+        >&2 echo
+        >&2 echo " See more about it in the Airflow's docker image documentation"
+        >&2 echo "     http://airflow.apache.org/docs/docker-stack/entrypoint"
+        >&2 echo
+        # We still allow the image to run with `airflow` user.
+        return
+    else
+        >&2 echo
+        >&2 echo "ERROR! You should run the image with GID=0"
+        >&2 echo
+        >&2 echo " You started the image with UID=$(id -u) and GID=$(id -g)"
+        >&2 echo
+        >&2 echo "The image should always be run with GID (Group ID) set to 0 regardless of the UID used."
+        >&2 echo " This is to make sure you can run the image with an arbitrary UID."
+        >&2 echo
+        >&2 echo " See more about it in the Airflow's docker image documentation"
+        >&2 echo "     http://airflow.apache.org/docs/docker-stack/entrypoint"
+        # This will not work so we fail hard
+        exit 1
+    fi
+}
+
+check_uid_gid
+
+# Set umask to 0002 to make all the directories created by the current user group-writeable
+# This allows the same directories to be writeable for any arbitrary user the image will be
+# run with, when the directory is created on a mounted volume and when that volume is later
+# reused with a different UID (but with GID=0)
+umask 0002
 
 CONNECTION_CHECK_MAX_COUNT=${CONNECTION_CHECK_MAX_COUNT:=20}
 readonly CONNECTION_CHECK_MAX_COUNT

[airflow] 26/36: Fix mistake and typos in doc/docstrings (#15180)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 6b0421b6bbf8d647a340efd51e21f92e1ac5f16e
Author: Xiaodong DENG <xd...@apache.org>
AuthorDate: Sun Apr 4 19:44:03 2021 +0200

    Fix mistake and typos in doc/docstrings (#15180)
    
    - Fix an apparent mistake in doc relating to catchup
    - Fix typo pickable (should be picklable)
    
    (cherry picked from commit 53dafa593fd7ce0be2a48dc9a9e993bb42b6abc5)
---
 airflow/providers/apache/hive/hooks/hive.py | 2 +-
 airflow/utils/timezone.py                   | 4 ++--
 docs/apache-airflow/dag-run.rst             | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/airflow/providers/apache/hive/hooks/hive.py b/airflow/providers/apache/hive/hooks/hive.py
index ab7b7b7..d261ab2 100644
--- a/airflow/providers/apache/hive/hooks/hive.py
+++ b/airflow/providers/apache/hive/hooks/hive.py
@@ -487,7 +487,7 @@ class HiveMetastoreHook(BaseHook):
 
     def __getstate__(self) -> Dict[str, Any]:
         # This is for pickling to work despite the thrift hive client not
-        # being pickable
+        # being picklable
         state = dict(self.__dict__)
         del state['metastore']
         return state
diff --git a/airflow/utils/timezone.py b/airflow/utils/timezone.py
index d302cbe..09736e5 100644
--- a/airflow/utils/timezone.py
+++ b/airflow/utils/timezone.py
@@ -56,7 +56,7 @@ def utcnow() -> dt.datetime:
     :return:
     """
     # pendulum utcnow() is not used as that sets a TimezoneInfo object
-    # instead of a Timezone. This is not pickable and also creates issues
+    # instead of a Timezone. This is not picklable and also creates issues
     # when using replace()
     result = dt.datetime.utcnow()
     result = result.replace(tzinfo=utc)
@@ -71,7 +71,7 @@ def utc_epoch() -> dt.datetime:
     :return:
     """
     # pendulum utcnow() is not used as that sets a TimezoneInfo object
-    # instead of a Timezone. This is not pickable and also creates issues
+    # instead of a Timezone. This is not picklable and also creates issues
     # when using replace()
     result = dt.datetime(1970, 1, 1)
     result = result.replace(tzinfo=utc)
diff --git a/docs/apache-airflow/dag-run.rst b/docs/apache-airflow/dag-run.rst
index dbcf68a..0752990 100644
--- a/docs/apache-airflow/dag-run.rst
+++ b/docs/apache-airflow/dag-run.rst
@@ -80,7 +80,7 @@ An Airflow DAG with a ``start_date``, possibly an ``end_date``, and a ``sched
 series of intervals which the scheduler turns into individual DAG Runs and executes. The scheduler, by default, will
 kick off a DAG Run for any interval that has not been run since the last execution date (or has been cleared). This concept is called Catchup.
 
-If your DAG is written to handle its catchup (i.e., not limited to the interval, but instead to ``Now`` for instance.),
+If your DAG is not written to handle its catchup (i.e., not limited to the interval, but instead to ``Now`` for instance.),
 then you will want to turn catchup off. This can be done by setting ``catchup = False`` in DAG  or ``catchup_by_default = False``
 in the configuration file. When turned off, the scheduler creates a DAG run only for the latest interval.
 

[airflow] 18/36: Add a note in set-config.rst on using Secrets Backend (#15274)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 24503cac350791243910f78f7e510479006d9a4e
Author: Lauri Koobas <la...@users.noreply.github.com>
AuthorDate: Thu Apr 8 16:25:49 2021 +0300

    Add a note in set-config.rst on using Secrets Backend (#15274)
    
    Clarifying documentation that when using configuration options that are connections (sql_alchemy_conn for example) then they should still be defined as config options in secrets backend and not connection options.
    
    (cherry picked from commit 1e1f9afa99aace2fc692cffde1e71bd6d9873f24)
---
 docs/apache-airflow/howto/set-config.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/apache-airflow/howto/set-config.rst b/docs/apache-airflow/howto/set-config.rst
index f1aac87..d36a96f 100644
--- a/docs/apache-airflow/howto/set-config.rst
+++ b/docs/apache-airflow/howto/set-config.rst
@@ -85,6 +85,9 @@ For example:
 
     export AIRFLOW__CORE__SQL_ALCHEMY_CONN_SECRET=sql_alchemy_conn
 
+.. note::
+    The config options must follow the config prefix naming convention defined within the secrets backend. This means that ``sql_alchemy_conn`` is not defined with a connection prefix, but with config prefix. For example it should be named as ``airflow/config/sql_alchemy_conn``
+
 The idea behind this is to not store passwords on boxes in plain text files.
 
 The universal order of precedence for all configuration options is as follows:

[airflow] 01/36: Adds Blinker dependency which is missing after recent changes (#15182)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 2ae87ec27d3ec48076234f477b9e5d087ec3ae98
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Sun Apr 4 01:57:56 2021 +0200

    Adds Blinker dependency which is missing after recent changes (#15182)
    
    This PR fixes a problem introduced by #14144
    
    This is a very weird and unforeseen issue. The change introduced a
    new import from flask `before_render_template` and this caused
    flask to require `blinker` dependency, even if it was not
    specified before as 'required' by flask. We have not seen it
    before, because changes to this part of the code do not trigger
    K8S tests, however subsequent PRs started to fail because
    the setup.py did not have `blinker` as dependency.
    
    However in CI image `blinker` was installed because it is
    needed by sentry. So the problem was only detectable in the
    production image.
    
    This is an ultimate proof that our test harness is really good in
    catchig this kind of errors.
    
    The root cause for it is described in
    https://stackoverflow.com/questions/38491075/flask-testing-signals-not-supported-error
    
    Flask support for signals is optional and it does not blinker as
    dependency, but importing some parts of flask triggers the need
    for signals.
    
    (cherry picked from commit 437850bd16ea71421613ce9ab361bafec90b7ece)
---
 setup.cfg | 1 +
 1 file changed, 1 insertion(+)

diff --git a/setup.cfg b/setup.cfg
index ed533ca..fbb2276 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -82,6 +82,7 @@ install_requires =
     alembic>=1.2, <2.0
     argcomplete~=1.10
     attrs>=20.0, <21.0
+    blinker
     cached_property~=1.5
     # cattrs >= 1.1.0 dropped support for Python 3.6
     cattrs>=1.0, <1.1.0;python_version<="3.6"

[airflow] 12/36: Adds new Airbyte provider (#14492)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit a05247b85a60eee94885a9ad929cc5aac8deb33b
Author: Marcos Marx <ma...@users.noreply.github.com>
AuthorDate: Sat Mar 6 11:19:30 2021 -0300

    Adds new Airbyte provider (#14492)
    
    This commit add hook, operators and sensors to interact with
    Airbyte external service.
    
    (cherry picked from commit 20b72aea4dc1e25f2aa3cfe62b45ca1ff29d1cbb)
---
 CONTRIBUTING.rst                                   |  23 ++--
 INSTALL                                            |  22 ++--
 airflow/providers/airbyte/CHANGELOG.rst            |  25 ++++
 airflow/providers/airbyte/__init__.py              |  17 +++
 airflow/providers/airbyte/example_dags/__init__.py |  16 +++
 .../example_dags/example_airbyte_trigger_job.py    |  64 +++++++++++
 airflow/providers/airbyte/hooks/__init__.py        |  17 +++
 airflow/providers/airbyte/hooks/airbyte.py         | 109 ++++++++++++++++++
 airflow/providers/airbyte/operators/__init__.py    |  17 +++
 airflow/providers/airbyte/operators/airbyte.py     |  85 ++++++++++++++
 airflow/providers/airbyte/provider.yaml            |  51 +++++++++
 airflow/providers/airbyte/sensors/__init__.py      |  16 +++
 airflow/providers/airbyte/sensors/airbyte.py       |  73 ++++++++++++
 airflow/providers/dependencies.json                |   3 +
 docs/apache-airflow-providers-airbyte/commits.rst  |  27 +++++
 .../connections.rst                                |  36 ++++++
 docs/apache-airflow-providers-airbyte/index.rst    | 121 ++++++++++++++++++++
 .../operators/airbyte.rst                          |  58 ++++++++++
 docs/apache-airflow/extra-packages-ref.rst         |   2 +
 docs/integration-logos/airbyte/Airbyte.png         | Bin 0 -> 7405 bytes
 docs/spelling_wordlist.txt                         |   2 +
 setup.py                                           |   1 +
 tests/core/test_providers_manager.py               |   1 +
 tests/providers/airbyte/__init__.py                |  16 +++
 tests/providers/airbyte/hooks/__init__.py          |  16 +++
 tests/providers/airbyte/hooks/test_airbyte.py      | 126 +++++++++++++++++++++
 tests/providers/airbyte/operators/__init__.py      |  16 +++
 tests/providers/airbyte/operators/test_airbyte.py  |  55 +++++++++
 tests/providers/airbyte/sensors/__init__.py        |  16 +++
 tests/providers/airbyte/sensors/test_airbyte.py    |  93 +++++++++++++++
 30 files changed, 1102 insertions(+), 22 deletions(-)

diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
index e82fd4e..7ac115c 100644
--- a/CONTRIBUTING.rst
+++ b/CONTRIBUTING.rst
@@ -585,17 +585,17 @@ This is the full list of those extras:
 
   .. START EXTRAS HERE
 
-all, all_dbs, amazon, apache.atlas, apache.beam, apache.cassandra, apache.druid, apache.hdfs,
-apache.hive, apache.kylin, apache.livy, apache.pig, apache.pinot, apache.spark, apache.sqoop,
-apache.webhdfs, async, atlas, aws, azure, cassandra, celery, cgroups, cloudant, cncf.kubernetes,
-crypto, dask, databricks, datadog, devel, devel_all, devel_ci, devel_hadoop, dingding, discord, doc,
-docker, druid, elasticsearch, exasol, facebook, ftp, gcp, gcp_api, github_enterprise, google,
-google_auth, grpc, hashicorp, hdfs, hive, http, imap, jdbc, jenkins, jira, kerberos, kubernetes,
-ldap, microsoft.azure, microsoft.mssql, microsoft.winrm, mongo, mssql, mysql, neo4j, odbc, openfaas,
-opsgenie, oracle, pagerduty, papermill, password, pinot, plexus, postgres, presto, qds, qubole,
-rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry, sftp, singularity, slack,
-snowflake, spark, sqlite, ssh, statsd, tableau, telegram, trino, vertica, virtualenv, webhdfs,
-winrm, yandex, zendesk
+airbyte, all, all_dbs, amazon, apache.atlas, apache.beam, apache.cassandra, apache.druid,
+apache.hdfs, apache.hive, apache.kylin, apache.livy, apache.pig, apache.pinot, apache.spark,
+apache.sqoop, apache.webhdfs, async, atlas, aws, azure, cassandra, celery, cgroups, cloudant,
+cncf.kubernetes, crypto, dask, databricks, datadog, devel, devel_all, devel_ci, devel_hadoop,
+dingding, discord, doc, docker, druid, elasticsearch, exasol, facebook, ftp, gcp, gcp_api,
+github_enterprise, google, google_auth, grpc, hashicorp, hdfs, hive, http, imap, jdbc, jenkins,
+jira, kerberos, kubernetes, ldap, microsoft.azure, microsoft.mssql, microsoft.winrm, mongo, mssql,
+mysql, neo4j, odbc, openfaas, opsgenie, oracle, pagerduty, papermill, password, pinot, plexus,
+postgres, presto, qds, qubole, rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry,
+sftp, singularity, slack, snowflake, spark, sqlite, ssh, statsd, tableau, telegram, trino, vertica,
+virtualenv, webhdfs, winrm, yandex, zendesk
 
   .. END EXTRAS HERE
 
@@ -653,6 +653,7 @@ Here is the list of packages and their extras:
 ========================== ===========================
 Package                    Extras
 ========================== ===========================
+airbyte                    http
 amazon                     apache.hive,google,imap,mongo,mysql,postgres,ssh
 apache.beam                google
 apache.druid               apache.hive
diff --git a/INSTALL b/INSTALL
index 34fccd2..46d15f6 100644
--- a/INSTALL
+++ b/INSTALL
@@ -97,17 +97,17 @@ The list of available extras:
 
 # START EXTRAS HERE
 
-all, all_dbs, amazon, apache.atlas, apache.beam, apache.cassandra, apache.druid, apache.hdfs,
-apache.hive, apache.kylin, apache.livy, apache.pig, apache.pinot, apache.spark, apache.sqoop,
-apache.webhdfs, async, atlas, aws, azure, cassandra, celery, cgroups, cloudant, cncf.kubernetes,
-crypto, dask, databricks, datadog, devel, devel_all, devel_ci, devel_hadoop, dingding, discord, doc,
-docker, druid, elasticsearch, exasol, facebook, ftp, gcp, gcp_api, github_enterprise, google,
-google_auth, grpc, hashicorp, hdfs, hive, http, imap, jdbc, jenkins, jira, kerberos, kubernetes,
-ldap, microsoft.azure, microsoft.mssql, microsoft.winrm, mongo, mssql, mysql, neo4j, odbc, openfaas,
-opsgenie, oracle, pagerduty, papermill, password, pinot, plexus, postgres, presto, qds, qubole,
-rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry, sftp, singularity, slack,
-snowflake, spark, sqlite, ssh, statsd, tableau, telegram, trino, vertica, virtualenv, webhdfs,
-winrm, yandex, zendesk
+airbyte, all, all_dbs, amazon, apache.atlas, apache.beam, apache.cassandra, apache.druid,
+apache.hdfs, apache.hive, apache.kylin, apache.livy, apache.pig, apache.pinot, apache.spark,
+apache.sqoop, apache.webhdfs, async, atlas, aws, azure, cassandra, celery, cgroups, cloudant,
+cncf.kubernetes, crypto, dask, databricks, datadog, devel, devel_all, devel_ci, devel_hadoop,
+dingding, discord, doc, docker, druid, elasticsearch, exasol, facebook, ftp, gcp, gcp_api,
+github_enterprise, google, google_auth, grpc, hashicorp, hdfs, hive, http, imap, jdbc, jenkins,
+jira, kerberos, kubernetes, ldap, microsoft.azure, microsoft.mssql, microsoft.winrm, mongo, mssql,
+mysql, neo4j, odbc, openfaas, opsgenie, oracle, pagerduty, papermill, password, pinot, plexus,
+postgres, presto, qds, qubole, rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry,
+sftp, singularity, slack, snowflake, spark, sqlite, ssh, statsd, tableau, telegram, trino, vertica,
+virtualenv, webhdfs, winrm, yandex, zendesk
 
 # END EXTRAS HERE
 
diff --git a/airflow/providers/airbyte/CHANGELOG.rst b/airflow/providers/airbyte/CHANGELOG.rst
new file mode 100644
index 0000000..cef7dda
--- /dev/null
+++ b/airflow/providers/airbyte/CHANGELOG.rst
@@ -0,0 +1,25 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+Changelog
+---------
+
+1.0.0
+.....
+
+Initial version of the provider.
diff --git a/airflow/providers/airbyte/__init__.py b/airflow/providers/airbyte/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/airflow/providers/airbyte/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/airflow/providers/airbyte/example_dags/__init__.py b/airflow/providers/airbyte/example_dags/__init__.py
new file mode 100644
index 0000000..13a8339
--- /dev/null
+++ b/airflow/providers/airbyte/example_dags/__init__.py
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/airflow/providers/airbyte/example_dags/example_airbyte_trigger_job.py b/airflow/providers/airbyte/example_dags/example_airbyte_trigger_job.py
new file mode 100644
index 0000000..1ac62a8
--- /dev/null
+++ b/airflow/providers/airbyte/example_dags/example_airbyte_trigger_job.py
@@ -0,0 +1,64 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Example DAG demonstrating the usage of the AirbyteTriggerSyncOperator."""
+
+from datetime import timedelta
+
+from airflow import DAG
+from airflow.providers.airbyte.operators.airbyte import AirbyteTriggerSyncOperator
+from airflow.providers.airbyte.sensors.airbyte import AirbyteJobSensor
+from airflow.utils.dates import days_ago
+
+args = {
+    'owner': 'airflow',
+}
+
+with DAG(
+    dag_id='example_airbyte_operator',
+    default_args=args,
+    schedule_interval=None,
+    start_date=days_ago(1),
+    dagrun_timeout=timedelta(minutes=60),
+    tags=['example'],
+) as dag:
+
+    # [START howto_operator_airbyte_synchronous]
+    sync_source_destination = AirbyteTriggerSyncOperator(
+        task_id='airbyte_sync_source_dest_example',
+        airbyte_conn_id='airbyte_default',
+        connection_id='15bc3800-82e4-48c3-a32d-620661273f28',
+    )
+    # [END howto_operator_airbyte_synchronous]
+
+    # [START howto_operator_airbyte_asynchronous]
+    async_source_destination = AirbyteTriggerSyncOperator(
+        task_id='airbyte_async_source_dest_example',
+        airbyte_conn_id='airbyte_default',
+        connection_id='15bc3800-82e4-48c3-a32d-620661273f28',
+        asynchronous=True,
+    )
+
+    airbyte_sensor = AirbyteJobSensor(
+        task_id='airbyte_sensor_source_dest_example',
+        airbyte_job_id="{{task_instance.xcom_pull(task_ids='airbyte_async_source_dest_example')}}",
+        airbyte_conn_id='airbyte_default',
+    )
+    # [END howto_operator_airbyte_asynchronous]
+
+    async_source_destination >> airbyte_sensor
diff --git a/airflow/providers/airbyte/hooks/__init__.py b/airflow/providers/airbyte/hooks/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/airflow/providers/airbyte/hooks/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/airflow/providers/airbyte/hooks/airbyte.py b/airflow/providers/airbyte/hooks/airbyte.py
new file mode 100644
index 0000000..0aeb4f8
--- /dev/null
+++ b/airflow/providers/airbyte/hooks/airbyte.py
@@ -0,0 +1,109 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import time
+from typing import Any, Optional
+
+from airflow.exceptions import AirflowException
+from airflow.providers.http.hooks.http import HttpHook
+
+
+class AirbyteHook(HttpHook):
+    """
+    Hook for Airbyte API
+
+    :param airbyte_conn_id: Required. The name of the Airflow connection to get
+        connection information for Airbyte.
+    :type airbyte_conn_id: str
+    :param api_version: Optional. Airbyte API version.
+    :type api_version: str
+    """
+
+    RUNNING = "running"
+    SUCCEEDED = "succeeded"
+    CANCELLED = "cancelled"
+    PENDING = "pending"
+    FAILED = "failed"
+    ERROR = "error"
+
+    def __init__(self, airbyte_conn_id: str = "airbyte_default", api_version: Optional[str] = "v1") -> None:
+        super().__init__(http_conn_id=airbyte_conn_id)
+        self.api_version: str = api_version
+
+    def wait_for_job(
+        self, job_id: str, wait_seconds: Optional[float] = 3, timeout: Optional[float] = 3600
+    ) -> None:
+        """
+        Helper method which polls a job to check if it finishes.
+
+        :param job_id: Required. Id of the Airbyte job
+        :type job_id: str
+        :param wait_seconds: Optional. Number of seconds between checks.
+        :type wait_seconds: float
+        :param timeout: Optional. How many seconds wait for job to be ready.
+            Used only if ``asynchronous`` is False.
+        :type timeout: float
+        """
+        state = None
+        start = time.monotonic()
+        while True:
+            if timeout and start + timeout < time.monotonic():
+                raise AirflowException(f"Timeout: Airbyte job {job_id} is not ready after {timeout}s")
+            time.sleep(wait_seconds)
+            try:
+                job = self.get_job(job_id=job_id)
+                state = job.json()["job"]["status"]
+            except AirflowException as err:
+                self.log.info("Retrying. Airbyte API returned server error when waiting for job: %s", err)
+                continue
+
+            if state in (self.RUNNING, self.PENDING):
+                continue
+            if state == self.SUCCEEDED:
+                break
+            if state == self.ERROR:
+                raise AirflowException(f"Job failed:\n{job}")
+            elif state == self.CANCELLED:
+                raise AirflowException(f"Job was cancelled:\n{job}")
+            else:
+                raise Exception(f"Encountered unexpected state `{state}` for job_id `{job_id}`")
+
+    def submit_sync_connection(self, connection_id: str) -> Any:
+        """
+        Submits a job to a Airbyte server.
+
+        :param connection_id: Required. The ConnectionId of the Airbyte Connection.
+        :type connectiond_id: str
+        """
+        return self.run(
+            endpoint=f"api/{self.api_version}/connections/sync",
+            json={"connectionId": connection_id},
+            headers={"accept": "application/json"},
+        )
+
+    def get_job(self, job_id: int) -> Any:
+        """
+        Gets the resource representation for a job in Airbyte.
+
+        :param job_id: Required. Id of the Airbyte job
+        :type job_id: int
+        """
+        return self.run(
+            endpoint=f"api/{self.api_version}/jobs/get",
+            json={"id": job_id},
+            headers={"accept": "application/json"},
+        )
diff --git a/airflow/providers/airbyte/operators/__init__.py b/airflow/providers/airbyte/operators/__init__.py
new file mode 100644
index 0000000..217e5db
--- /dev/null
+++ b/airflow/providers/airbyte/operators/__init__.py
@@ -0,0 +1,17 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/airflow/providers/airbyte/operators/airbyte.py b/airflow/providers/airbyte/operators/airbyte.py
new file mode 100644
index 0000000..6932fa3
--- /dev/null
+++ b/airflow/providers/airbyte/operators/airbyte.py
@@ -0,0 +1,85 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from typing import Optional
+
+from airflow.models import BaseOperator
+from airflow.providers.airbyte.hooks.airbyte import AirbyteHook
+from airflow.utils.decorators import apply_defaults
+
+
+class AirbyteTriggerSyncOperator(BaseOperator):
+    """
+    This operator allows you to submit a job to an Airbyte server to run a integration
+    process between your source and destination.
+
+    .. seealso::
+        For more information on how to use this operator, take a look at the guide:
+        :ref:`howto/operator:AirbyteTriggerSyncOperator`
+
+    :param airbyte_conn_id: Required. The name of the Airflow connection to get connection
+        information for Airbyte.
+    :type airbyte_conn_id: str
+    :param connection_id: Required. The Airbyte ConnectionId UUID between a source and destination.
+    :type connection_id: str
+    :param asynchronous: Optional. Flag to get job_id after submitting the job to the Airbyte API.
+        This is useful for submitting long running jobs and
+        waiting on them asynchronously using the AirbyteJobSensor.
+    :type asynchronous: bool
+    :param api_version: Optional. Airbyte API version.
+    :type api_version: str
+    :param wait_seconds: Optional. Number of seconds between checks. Only used when ``asynchronous`` is False.
+    :type wait_seconds: float
+    :param timeout: Optional. The amount of time, in seconds, to wait for the request to complete.
+        Only used when ``asynchronous`` is False.
+    :type timeout: float
+    """
+
+    template_fields = ('connection_id',)
+
+    @apply_defaults
+    def __init__(
+        self,
+        connection_id: str,
+        airbyte_conn_id: str = "airbyte_default",
+        asynchronous: Optional[bool] = False,
+        api_version: Optional[str] = "v1",
+        wait_seconds: Optional[float] = 3,
+        timeout: Optional[float] = 3600,
+        **kwargs,
+    ) -> None:
+        super().__init__(**kwargs)
+        self.airbyte_conn_id = airbyte_conn_id
+        self.connection_id = connection_id
+        self.timeout = timeout
+        self.api_version = api_version
+        self.wait_seconds = wait_seconds
+        self.asynchronous = asynchronous
+
+    def execute(self, context) -> None:
+        """Create Airbyte Job and wait to finish"""
+        hook = AirbyteHook(airbyte_conn_id=self.airbyte_conn_id, api_version=self.api_version)
+        job_object = hook.submit_sync_connection(connection_id=self.connection_id)
+        job_id = job_object.json()['job']['id']
+
+        self.log.info("Job %s was submitted to Airbyte Server", job_id)
+        if not self.asynchronous:
+            self.log.info('Waiting for job %s to complete', job_id)
+            hook.wait_for_job(job_id=job_id, wait_seconds=self.wait_seconds, timeout=self.timeout)
+            self.log.info('Job %s completed successfully', job_id)
+
+        return job_id
diff --git a/airflow/providers/airbyte/provider.yaml b/airflow/providers/airbyte/provider.yaml
new file mode 100644
index 0000000..77b109f
--- /dev/null
+++ b/airflow/providers/airbyte/provider.yaml
@@ -0,0 +1,51 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+---
+package-name: apache-airflow-providers-airbyte
+name: Airbyte
+description: |
+  `Airbyte <https://airbyte.io/>`__
+
+versions:
+  - 1.0.0
+
+integrations:
+  - integration-name: Airbyte
+    external-doc-url: https://www.airbyte.io/
+    logo: /integration-logos/airbyte/Airbyte.png
+    how-to-guide:
+      - /docs/apache-airflow-providers-airbyte/operators/airbyte.rst
+    tags: [service]
+
+operators:
+  - integration-name: Airbyte
+    python-modules:
+      - airflow.providers.airbyte.operators.airbyte
+
+hooks:
+  - integration-name: Airbyte
+    python-modules:
+      - airflow.providers.airbyte.hooks.airbyte
+
+sensors:
+  - integration-name: Airbyte
+    python-modules:
+      - airflow.providers.airbyte.sensors.airbyte
+
+hook-class-names:
+  - airflow.providers.airbyte.hooks.airbyte.AirbyteHook
diff --git a/airflow/providers/airbyte/sensors/__init__.py b/airflow/providers/airbyte/sensors/__init__.py
new file mode 100644
index 0000000..13a8339
--- /dev/null
+++ b/airflow/providers/airbyte/sensors/__init__.py
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/airflow/providers/airbyte/sensors/airbyte.py b/airflow/providers/airbyte/sensors/airbyte.py
new file mode 100644
index 0000000..9799ade
--- /dev/null
+++ b/airflow/providers/airbyte/sensors/airbyte.py
@@ -0,0 +1,73 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""This module contains a Airbyte Job sensor."""
+from typing import Optional
+
+from airflow.exceptions import AirflowException
+from airflow.providers.airbyte.hooks.airbyte import AirbyteHook
+from airflow.sensors.base import BaseSensorOperator
+from airflow.utils.decorators import apply_defaults
+
+
+class AirbyteJobSensor(BaseSensorOperator):
+    """
+    Check for the state of a previously submitted Airbyte job.
+
+    :param airbyte_job_id: Required. Id of the Airbyte job
+    :type airbyte_job_id: str
+    :param airbyte_conn_id: Required. The name of the Airflow connection to get
+        connection information for Airbyte.
+    :type airbyte_conn_id: str
+    :param api_version: Optional. Airbyte API version.
+    :type api_version: str
+    """
+
+    template_fields = ('airbyte_job_id',)
+    ui_color = '#6C51FD'
+
+    @apply_defaults
+    def __init__(
+        self,
+        *,
+        airbyte_job_id: str,
+        airbyte_conn_id: str = 'airbyte_default',
+        api_version: Optional[str] = "v1",
+        **kwargs,
+    ) -> None:
+        super().__init__(**kwargs)
+        self.airbyte_conn_id = airbyte_conn_id
+        self.airbyte_job_id = airbyte_job_id
+        self.api_version = api_version
+
+    def poke(self, context: dict) -> bool:
+        hook = AirbyteHook(airbyte_conn_id=self.airbyte_conn_id, api_version=self.api_version)
+        job = hook.get_job(job_id=self.airbyte_job_id)
+        status = job.json()['job']['status']
+
+        if status == hook.FAILED:
+            raise AirflowException(f"Job failed: \n{job}")
+        elif status == hook.CANCELLED:
+            raise AirflowException(f"Job was cancelled: \n{job}")
+        elif status == hook.SUCCEEDED:
+            self.log.info("Job %s completed successfully.", self.airbyte_job_id)
+            return True
+        elif status == hook.ERROR:
+            self.log.info("Job %s attempt has failed.", self.airbyte_job_id)
+
+        self.log.info("Waiting for job %s to complete.", self.airbyte_job_id)
+        return False
diff --git a/airflow/providers/dependencies.json b/airflow/providers/dependencies.json
index 81a3ba4..6027656 100644
--- a/airflow/providers/dependencies.json
+++ b/airflow/providers/dependencies.json
@@ -1,4 +1,7 @@
 {
+  "airbyte": [
+    "http"
+  ],
   "amazon": [
     "apache.hive",
     "google",
diff --git a/docs/apache-airflow-providers-airbyte/commits.rst b/docs/apache-airflow-providers-airbyte/commits.rst
new file mode 100644
index 0000000..cae1272
--- /dev/null
+++ b/docs/apache-airflow-providers-airbyte/commits.rst
@@ -0,0 +1,27 @@
+
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+Package apache-airflow-providers-airbyte
+----------------------------------------
+
+`Airbyte <https://airbyte.io/>`__
+
+
+This is detailed commit list of changes for versions provider package: ``airbyte``.
+For high-level changelog, see :doc:`package information including changelog <index>`.
diff --git a/docs/apache-airflow-providers-airbyte/connections.rst b/docs/apache-airflow-providers-airbyte/connections.rst
new file mode 100644
index 0000000..31b69c7
--- /dev/null
+++ b/docs/apache-airflow-providers-airbyte/connections.rst
@@ -0,0 +1,36 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Airbyte Connection
+==================
+The Airbyte connection type use the HTTP protocol.
+
+Configuring the Connection
+--------------------------
+Host(required)
+    The host to connect to the Airbyte server.
+
+Port (required)
+    The port for the Airbyte server.
+
+Login (optional)
+    Specify the user name to connect.
+
+Password (optional)
+    Specify the password to connect.
diff --git a/docs/apache-airflow-providers-airbyte/index.rst b/docs/apache-airflow-providers-airbyte/index.rst
new file mode 100644
index 0000000..d83f5e0
--- /dev/null
+++ b/docs/apache-airflow-providers-airbyte/index.rst
@@ -0,0 +1,121 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+``apache-airflow-providers-airbyte``
+====================================
+
+Content
+-------
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Guides
+
+    Operators <operators/airbyte>
+    Connection types <connections>
+
+.. toctree::
+    :maxdepth: 1
+    :caption: References
+
+    Python API <_api/airflow/providers/airbyte/index>
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Resources
+
+    Example DAGs <https://github.com/apache/airflow/tree/master/airflow/providers/airbyte/example_dags>
+    PyPI Repository <https://pypi.org/project/apache-airflow-providers-airbyte/>
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Commits
+
+    Detailed list of commits <commits>
+
+Package apache-airflow-providers-airbyte
+----------------------------------------
+
+`Airbyte <https://www.airbyte.io/>`__
+
+
+Release: 1.0.0
+
+Provider package
+----------------
+
+This is a provider package for ``airbyte`` provider. All classes for this provider package
+are in ``airflow.providers.airbyte`` python package.
+
+Installation
+------------
+
+.. note::
+
+    On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
+    does not yet work with Apache Airflow and might lead to errors in installation - depends on your choice
+    of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4
+    ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3, you need to add option
+    ``--use-deprecated legacy-resolver`` to your pip install command.
+
+
+You can install this package on top of an existing airflow 2.* installation via
+``pip install apache-airflow-providers-airbyte``
+
+Cross provider package dependencies
+-----------------------------------
+
+Those are dependencies that might be needed in order to use all the features of the package.
+You need to install the specified backport providers package in order to use them.
+
+You can install such cross-provider dependencies when installing from PyPI. For example:
+
+.. code-block:: bash
+
+    pip install apache-airflow-providers-airbyte[http]
+
+
+================================================================================================  ========
+Dependent package                                                                                 Extra
+================================================================================================  ========
+`apache-airflow-providers-http <https://airflow.apache.org/docs/apache-airflow-providers-http>`_  ``http``
+================================================================================================  ========
+
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Changelog
+---------
+
+1.0.0
+.....
+
+Initial version of the provider.
diff --git a/docs/apache-airflow-providers-airbyte/operators/airbyte.rst b/docs/apache-airflow-providers-airbyte/operators/airbyte.rst
new file mode 100644
index 0000000..b674627
--- /dev/null
+++ b/docs/apache-airflow-providers-airbyte/operators/airbyte.rst
@@ -0,0 +1,58 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+.. _howto/operator:AirbyteTriggerSyncOperator:
+
+AirbyteTriggerSyncOperator
+==========================
+
+Use the :class:`~airflow.providers.airbyte.operators.AirbyteTriggerSyncOperator` to
+trigger an existing ConnectionId sync job in Airbyte.
+
+.. warning::
+  This operator triggers a synchronization job in Airbyte.
+  If triggered again, this operator does not guarantee idempotency.
+  You must be aware of the source (database, API, etc) you are updating/sync and
+  the method applied to perform the operation in Airbyte.
+
+
+Using the Operator
+^^^^^^^^^^^^^^^^^^
+
+The AirbyteTriggerSyncOperator requires the ``connection_id`` this is the uuid identifier
+create in Airbyte between a source and destination synchronization job.
+Use the ``airbyte_conn_id`` parameter to specify the Airbyte connection to use to
+connect to your account.
+
+You can trigger a synchronization job in Airflow in two ways with the Operator. The first one
+is a synchronous process. This will trigger the Airbyte job and the Operator manage the status
+of the job. Another way is use the flag ``async = True`` so the Operator only trigger the job and
+return the ``job_id`` that should be pass to the AirbyteSensor.
+
+An example using the synchronous way:
+
+.. exampleinclude:: /../../airflow/providers/airbyte/example_dags/example_airbyte_trigger_job.py
+    :language: python
+    :start-after: [START howto_operator_airbyte_synchronous]
+    :end-before: [END howto_operator_airbyte_synchronous]
+
+An example using the async way:
+
+.. exampleinclude:: /../../airflow/providers/airbyte/example_dags/example_airbyte_trigger_job.py
+    :language: python
+    :start-after: [START howto_operator_airbyte_asynchronous]
+    :end-before: [END howto_operator_airbyte_asynchronous]
diff --git a/docs/apache-airflow/extra-packages-ref.rst b/docs/apache-airflow/extra-packages-ref.rst
index 601c6bc..b902868 100644
--- a/docs/apache-airflow/extra-packages-ref.rst
+++ b/docs/apache-airflow/extra-packages-ref.rst
@@ -141,6 +141,8 @@ Those are extras that add dependencies needed for integration with external serv
 +---------------------+-----------------------------------------------------+-----------------------------------------------------+
 | extra               | install command                                     | enables                                             |
 +=====================+=====================================================+=====================================================+
+| airbyte             | ``pip install 'apache-airflow[airbyte]'``           | Airbyte hooks and operators                         |
++---------------------+-----------------------------------------------------+-----------------------------------------------------+
 | amazon              | ``pip install 'apache-airflow[amazon]'``            | Amazon Web Services                                 |
 +---------------------+-----------------------------------------------------+-----------------------------------------------------+
 | azure               | ``pip install 'apache-airflow[microsoft.azure]'``   | Microsoft Azure                                     |
diff --git a/docs/integration-logos/airbyte/Airbyte.png b/docs/integration-logos/airbyte/Airbyte.png
new file mode 100644
index 0000000..0cc1d07
Binary files /dev/null and b/docs/integration-logos/airbyte/Airbyte.png differ
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
index ace29d3..2ebb5d1 100644
--- a/docs/spelling_wordlist.txt
+++ b/docs/spelling_wordlist.txt
@@ -1,6 +1,7 @@
 Ack
 Acyclic
 Airbnb
+Airbyte
 AirflowException
 Aizhamal
 Alphasort
@@ -420,6 +421,7 @@ acyclic
 adhoc
 aijamalnk
 airbnb
+airbyte
 airfl
 airflowignore
 ajax
diff --git a/setup.py b/setup.py
index 9ccd60e..5ec7d37 100644
--- a/setup.py
+++ b/setup.py
@@ -523,6 +523,7 @@ devel_hadoop = devel_minreq + hdfs + hive + kerberos + presto + webhdfs
 
 # Dict of all providers which are part of the Apache Airflow repository together with their requirements
 PROVIDERS_REQUIREMENTS: Dict[str, List[str]] = {
+    'airbyte': [],
     'amazon': amazon,
     'apache.beam': apache_beam,
     'apache.cassandra': cassandra,
diff --git a/tests/core/test_providers_manager.py b/tests/core/test_providers_manager.py
index 5fd0af4..7299971 100644
--- a/tests/core/test_providers_manager.py
+++ b/tests/core/test_providers_manager.py
@@ -21,6 +21,7 @@ import unittest
 from airflow.providers_manager import ProvidersManager
 
 ALL_PROVIDERS = [
+    'apache-airflow-providers-airbyte',
     'apache-airflow-providers-amazon',
     'apache-airflow-providers-apache-beam',
     'apache-airflow-providers-apache-cassandra',
diff --git a/tests/providers/airbyte/__init__.py b/tests/providers/airbyte/__init__.py
new file mode 100644
index 0000000..13a8339
--- /dev/null
+++ b/tests/providers/airbyte/__init__.py
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/tests/providers/airbyte/hooks/__init__.py b/tests/providers/airbyte/hooks/__init__.py
new file mode 100644
index 0000000..13a8339
--- /dev/null
+++ b/tests/providers/airbyte/hooks/__init__.py
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/tests/providers/airbyte/hooks/test_airbyte.py b/tests/providers/airbyte/hooks/test_airbyte.py
new file mode 100644
index 0000000..09f10be
--- /dev/null
+++ b/tests/providers/airbyte/hooks/test_airbyte.py
@@ -0,0 +1,126 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+import unittest
+from unittest import mock
+
+import pytest
+import requests_mock
+
+from airflow.exceptions import AirflowException
+from airflow.models import Connection
+from airflow.providers.airbyte.hooks.airbyte import AirbyteHook
+from airflow.utils import db
+
+
+class TestAirbyteHook(unittest.TestCase):
+    """
+    Test all functions from Airbyte Hook
+    """
+
+    airbyte_conn_id = 'airbyte_conn_id_test'
+    connection_id = 'conn_test_sync'
+    job_id = 1
+    sync_connection_endpoint = 'http://test-airbyte:8001/api/v1/connections/sync'
+    get_job_endpoint = 'http://test-airbyte:8001/api/v1/jobs/get'
+    _mock_sync_conn_success_response_body = {'job': {'id': 1}}
+    _mock_job_status_success_response_body = {'job': {'status': 'succeeded'}}
+
+    def setUp(self):
+        db.merge_conn(
+            Connection(
+                conn_id='airbyte_conn_id_test', conn_type='http', host='http://test-airbyte', port=8001
+            )
+        )
+        self.hook = AirbyteHook(airbyte_conn_id=self.airbyte_conn_id)
+
+    def return_value_get_job(self, status):
+        response = mock.Mock()
+        response.json.return_value = {'job': {'status': status}}
+        return response
+
+    @requests_mock.mock()
+    def test_submit_sync_connection(self, m):
+        m.post(
+            self.sync_connection_endpoint, status_code=200, json=self._mock_sync_conn_success_response_body
+        )
+        resp = self.hook.submit_sync_connection(connection_id=self.connection_id)
+        assert resp.status_code == 200
+        assert resp.json() == self._mock_sync_conn_success_response_body
+
+    @requests_mock.mock()
+    def test_get_job_status(self, m):
+        m.post(self.get_job_endpoint, status_code=200, json=self._mock_job_status_success_response_body)
+        resp = self.hook.get_job(job_id=self.job_id)
+        assert resp.status_code == 200
+        assert resp.json() == self._mock_job_status_success_response_body
+
+    @mock.patch('airflow.providers.airbyte.hooks.airbyte.AirbyteHook.get_job')
+    def test_wait_for_job_succeeded(self, mock_get_job):
+        mock_get_job.side_effect = [self.return_value_get_job(self.hook.SUCCEEDED)]
+        self.hook.wait_for_job(job_id=self.job_id, wait_seconds=0)
+        mock_get_job.assert_called_once_with(job_id=self.job_id)
+
+    @mock.patch('airflow.providers.airbyte.hooks.airbyte.AirbyteHook.get_job')
+    def test_wait_for_job_error(self, mock_get_job):
+        mock_get_job.side_effect = [
+            self.return_value_get_job(self.hook.RUNNING),
+            self.return_value_get_job(self.hook.ERROR),
+        ]
+        with pytest.raises(AirflowException, match="Job failed"):
+            self.hook.wait_for_job(job_id=self.job_id, wait_seconds=0)
+
+        calls = [mock.call(job_id=self.job_id), mock.call(job_id=self.job_id)]
+        assert mock_get_job.has_calls(calls)
+
+    @mock.patch('airflow.providers.airbyte.hooks.airbyte.AirbyteHook.get_job')
+    def test_wait_for_job_timeout(self, mock_get_job):
+        mock_get_job.side_effect = [
+            self.return_value_get_job(self.hook.PENDING),
+            self.return_value_get_job(self.hook.RUNNING),
+            self.return_value_get_job(self.hook.RUNNING),
+        ]
+        with pytest.raises(AirflowException, match="Timeout"):
+            self.hook.wait_for_job(job_id=self.job_id, wait_seconds=2, timeout=1)
+
+        calls = [mock.call(job_id=self.job_id), mock.call(job_id=self.job_id), mock.call(job_id=self.job_id)]
+        assert mock_get_job.has_calls(calls)
+
+    @mock.patch('airflow.providers.airbyte.hooks.airbyte.AirbyteHook.get_job')
+    def test_wait_for_job_state_unrecognized(self, mock_get_job):
+        mock_get_job.side_effect = [
+            self.return_value_get_job(self.hook.RUNNING),
+            self.return_value_get_job("UNRECOGNIZED"),
+        ]
+        with pytest.raises(Exception, match="unexpected state"):
+            self.hook.wait_for_job(job_id=self.job_id, wait_seconds=0)
+
+        calls = [mock.call(job_id=self.job_id), mock.call(job_id=self.job_id)]
+        assert mock_get_job.has_calls(calls)
+
+    @mock.patch('airflow.providers.airbyte.hooks.airbyte.AirbyteHook.get_job')
+    def test_wait_for_job_cancelled(self, mock_get_job):
+        mock_get_job.side_effect = [
+            self.return_value_get_job(self.hook.RUNNING),
+            self.return_value_get_job(self.hook.CANCELLED),
+        ]
+        with pytest.raises(AirflowException, match="Job was cancelled"):
+            self.hook.wait_for_job(job_id=self.job_id, wait_seconds=0)
+
+        calls = [mock.call(job_id=self.job_id), mock.call(job_id=self.job_id)]
+        assert mock_get_job.has_calls(calls)
diff --git a/tests/providers/airbyte/operators/__init__.py b/tests/providers/airbyte/operators/__init__.py
new file mode 100644
index 0000000..13a8339
--- /dev/null
+++ b/tests/providers/airbyte/operators/__init__.py
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/tests/providers/airbyte/operators/test_airbyte.py b/tests/providers/airbyte/operators/test_airbyte.py
new file mode 100644
index 0000000..bc56c5d
--- /dev/null
+++ b/tests/providers/airbyte/operators/test_airbyte.py
@@ -0,0 +1,55 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+import unittest
+from unittest import mock
+
+from airflow.providers.airbyte.operators.airbyte import AirbyteTriggerSyncOperator
+
+
+class TestAirbyteTriggerSyncOp(unittest.TestCase):
+    """
+    Test execute function from Airbyte Operator
+    """
+
+    airbyte_conn_id = 'test_airbyte_conn_id'
+    connection_id = 'test_airbyte_connection'
+    job_id = 1
+    wait_seconds = 0
+    timeout = 360
+
+    @mock.patch('airflow.providers.airbyte.hooks.airbyte.AirbyteHook.submit_sync_connection')
+    @mock.patch('airflow.providers.airbyte.hooks.airbyte.AirbyteHook.wait_for_job', return_value=None)
+    def test_execute(self, mock_wait_for_job, mock_submit_sync_connection):
+        mock_submit_sync_connection.return_value = mock.Mock(
+            **{'json.return_value': {'job': {'id': self.job_id}}}
+        )
+
+        op = AirbyteTriggerSyncOperator(
+            task_id='test_Airbyte_op',
+            airbyte_conn_id=self.airbyte_conn_id,
+            connection_id=self.connection_id,
+            wait_seconds=self.wait_seconds,
+            timeout=self.timeout,
+        )
+        op.execute({})
+
+        mock_submit_sync_connection.assert_called_once_with(connection_id=self.connection_id)
+        mock_wait_for_job.assert_called_once_with(
+            job_id=self.job_id, wait_seconds=self.wait_seconds, timeout=self.timeout
+        )
diff --git a/tests/providers/airbyte/sensors/__init__.py b/tests/providers/airbyte/sensors/__init__.py
new file mode 100644
index 0000000..13a8339
--- /dev/null
+++ b/tests/providers/airbyte/sensors/__init__.py
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git a/tests/providers/airbyte/sensors/test_airbyte.py b/tests/providers/airbyte/sensors/test_airbyte.py
new file mode 100644
index 0000000..5bd69b8
--- /dev/null
+++ b/tests/providers/airbyte/sensors/test_airbyte.py
@@ -0,0 +1,93 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import unittest
+from unittest import mock
+
+import pytest
+
+from airflow import AirflowException
+from airflow.providers.airbyte.sensors.airbyte import AirbyteJobSensor
+
+
+class TestAirbyteJobSensor(unittest.TestCase):
+
+    task_id = "task-id"
+    airbyte_conn_id = "airbyte-conn-test"
+    job_id = 1
+    timeout = 120
+
+    def get_job(self, status):
+        response = mock.Mock()
+        response.json.return_value = {'job': {'status': status}}
+        return response
+
+    @mock.patch('airflow.providers.airbyte.hooks.airbyte.AirbyteHook.get_job')
+    def test_done(self, mock_get_job):
+        mock_get_job.return_value = self.get_job('succeeded')
+
+        sensor = AirbyteJobSensor(
+            task_id=self.task_id,
+            airbyte_job_id=self.job_id,
+            airbyte_conn_id=self.airbyte_conn_id,
+        )
+        ret = sensor.poke(context={})
+        mock_get_job.assert_called_once_with(job_id=self.job_id)
+        assert ret
+
+    @mock.patch('airflow.providers.airbyte.hooks.airbyte.AirbyteHook.get_job')
+    def test_failed(self, mock_get_job):
+        mock_get_job.return_value = self.get_job('failed')
+
+        sensor = AirbyteJobSensor(
+            task_id=self.task_id,
+            airbyte_job_id=self.job_id,
+            airbyte_conn_id=self.airbyte_conn_id,
+        )
+        with pytest.raises(AirflowException, match="Job failed"):
+            sensor.poke(context={})
+
+        mock_get_job.assert_called_once_with(job_id=self.job_id)
+
+    @mock.patch('airflow.providers.airbyte.hooks.airbyte.AirbyteHook.get_job')
+    def test_running(self, mock_get_job):
+        mock_get_job.return_value = self.get_job('running')
+
+        sensor = AirbyteJobSensor(
+            task_id=self.task_id,
+            airbyte_job_id=self.job_id,
+            airbyte_conn_id=self.airbyte_conn_id,
+        )
+        ret = sensor.poke(context={})
+
+        mock_get_job.assert_called_once_with(job_id=self.job_id)
+
+        assert not ret
+
+    @mock.patch('airflow.providers.airbyte.hooks.airbyte.AirbyteHook.get_job')
+    def test_cancelled(self, mock_get_job):
+        mock_get_job.return_value = self.get_job('cancelled')
+
+        sensor = AirbyteJobSensor(
+            task_id=self.task_id,
+            airbyte_job_id=self.job_id,
+            airbyte_conn_id=self.airbyte_conn_id,
+        )
+        with pytest.raises(AirflowException, match="Job was cancelled"):
+            sensor.poke(context={})
+
+        mock_get_job.assert_called_once_with(job_id=self.job_id)

[airflow] 07/36: not fail on missing status in tests

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit f473309aba7757a9e2d6195143e416ab981672b5
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Tue Apr 6 02:42:41 2021 +0200

    not fail on missing status in tests
---
 scripts/ci/libraries/_parallel.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/ci/libraries/_parallel.sh b/scripts/ci/libraries/_parallel.sh
index 739bae1..e2f8ad4 100644
--- a/scripts/ci/libraries/_parallel.sh
+++ b/scripts/ci/libraries/_parallel.sh
@@ -147,7 +147,7 @@ function parallel::print_job_summary_and_return_status_code() {
     for job_path in "${PARALLEL_MONITORED_DIR}/${SEMAPHORE_NAME}/"*
     do
         job="$(basename "${job_path}")"
-        status=$(cat "${PARALLEL_MONITORED_DIR}/${SEMAPHORE_NAME}/${job}/status")
+        status=$(cat "${PARALLEL_MONITORED_DIR}/${SEMAPHORE_NAME}/${job}/status" || true)
         if [[ ${status} == "0" ]]; then
             parallel::output_log_for_successful_job "${job}"
         else

[airflow] 23/36: Fixed #14270: Add error message in OOM situations (#15207)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 8adb12651de6955b1a702f04af5578c88e950603
Author: Andrew Godwin <an...@astronomer.io>
AuthorDate: Tue Apr 6 13:02:11 2021 -0600

    Fixed #14270: Add error message in OOM situations (#15207)
    
    In the case where a child process is reaped early (before we get to it)
    the presumption in the code is that it is due to an OOM error and we set
    the return code -9. This adds an error message alongside that return
    code to make it more obvious.
    
    (cherry picked from commit 18e2c1de776c8c3bc42c984ea0d31515788b6572)
---
 airflow/task/task_runner/standard_task_runner.py   |  8 +++
 .../task/task_runner/test_standard_task_runner.py  | 59 ++++++++++++++++++----
 2 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/airflow/task/task_runner/standard_task_runner.py b/airflow/task/task_runner/standard_task_runner.py
index 505b225..bb566b2 100644
--- a/airflow/task/task_runner/standard_task_runner.py
+++ b/airflow/task/task_runner/standard_task_runner.py
@@ -121,3 +121,11 @@ class StandardTaskRunner(BaseTaskRunner):
         if self._rc is None:
             # Something else reaped it before we had a chance, so let's just "guess" at an error code.
             self._rc = -9
+
+        if self._rc == -9:
+            # If either we or psutil gives out a -9 return code, it likely means
+            # an OOM happened
+            self.log.error(
+                'Job %s was killed before it finished (likely due to running out of memory)',
+                self._task_instance.job_id,
+            )
diff --git a/tests/task/task_runner/test_standard_task_runner.py b/tests/task/task_runner/test_standard_task_runner.py
index fcd4948..6a3ab5d 100644
--- a/tests/task/task_runner/test_standard_task_runner.py
+++ b/tests/task/task_runner/test_standard_task_runner.py
@@ -19,11 +19,11 @@ import getpass
 import logging
 import os
 import time
-import unittest
 from logging.config import dictConfig
 from unittest import mock
 
 import psutil
+import pytest
 
 from airflow import models, settings
 from airflow.jobs.local_task_job import LocalTaskJob
@@ -48,22 +48,24 @@ LOGGING_CONFIG = {
             'class': 'logging.StreamHandler',
             'formatter': 'airflow.task',
             'stream': 'ext://sys.stdout',
-        }
+        },
     },
-    'loggers': {'airflow': {'handlers': ['console'], 'level': 'INFO', 'propagate': False}},
+    'loggers': {'airflow': {'handlers': ['console'], 'level': 'INFO', 'propagate': True}},
 }
 
 
-class TestStandardTaskRunner(unittest.TestCase):
-    @classmethod
-    def setUpClass(cls):
+class TestStandardTaskRunner:
+    @pytest.fixture(autouse=True, scope="class")
+    def logging_and_db(self):
+        """
+        This fixture sets up logging to have a different setup on the way in
+        (as the test environment does not have enough context for the normal
+        way to run) and ensures they reset back to normal on the way out.
+        """
         dictConfig(LOGGING_CONFIG)
-
-    @classmethod
-    def tearDownClass(cls):
+        yield
         airflow_logger = logging.getLogger('airflow')
         airflow_logger.handlers = []
-        airflow_logger.propagate = True
         try:
             clear_db_runs()
         except Exception:  # noqa pylint: disable=broad-except
@@ -131,6 +133,43 @@ class TestStandardTaskRunner(unittest.TestCase):
 
         assert runner.return_code() is not None
 
+    def test_early_reap_exit(self, caplog):
+        """
+        Tests that when a child process running a task is killed externally
+        (e.g. by an OOM error, which we fake here), then we get return code
+        -9 and a log message.
+        """
+        # Set up mock task
+        local_task_job = mock.Mock()
+        local_task_job.task_instance = mock.MagicMock()
+        local_task_job.task_instance.run_as_user = getpass.getuser()
+        local_task_job.task_instance.command_as_list.return_value = [
+            'airflow',
+            'tasks',
+            'test',
+            'test_on_kill',
+            'task1',
+            '2016-01-01',
+        ]
+
+        # Kick off the runner
+        runner = StandardTaskRunner(local_task_job)
+        runner.start()
+        time.sleep(0.2)
+
+        # Kill the child process externally from the runner
+        # Note that we have to do this from ANOTHER process, as if we just
+        # call os.kill here we're doing it from the parent process and it
+        # won't be the same as an external kill in terms of OS tracking.
+        pgid = os.getpgid(runner.process.pid)
+        os.system(f"kill -s KILL {pgid}")
+        time.sleep(0.2)
+
+        runner.terminate()
+
+        assert runner.return_code() == -9
+        assert "running out of memory" in caplog.text
+
     def test_on_kill(self):
         """
         Test that ensures that clearing in the UI SIGTERMS

[airflow] 33/36: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods (#15204)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 110adfef9e89f2d88b74d50008437bbb8c30ee64
Author: Xiaodong DENG <xd...@apache.org>
AuthorDate: Thu Apr 8 12:11:08 2021 +0200

    BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods (#15204)
    
    closes: #15193
    
    Currently condition if the pod is created by Airflow is not considered. This commit fixes this.
    
    We decide if the Pod is created by Airflow via checking if it has all the labels added in PodGenerator.construct_pod() or KubernetesPodOperator.create_labels_for_pod().
    
    (cherry picked from commit c594d9cfb32bbcfe30af3f5dcb452c6053cacc95)
---
 airflow/cli/cli_parser.py                     |  6 +++-
 airflow/cli/commands/kubernetes_command.py    | 18 +++++++++++-
 tests/cli/commands/test_kubernetes_command.py | 40 +++++++++++++++++++++------
 3 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/airflow/cli/cli_parser.py b/airflow/cli/cli_parser.py
index c33a854..b5384b4 100644
--- a/airflow/cli/cli_parser.py
+++ b/airflow/cli/cli_parser.py
@@ -1341,7 +1341,11 @@ CONFIG_COMMANDS = (
 KUBERNETES_COMMANDS = (
     ActionCommand(
         name='cleanup-pods',
-        help="Clean up Kubernetes pods in evicted/failed/succeeded states",
+        help=(
+            "Clean up Kubernetes pods "
+            "(created by KubernetesExecutor/KubernetesPodOperator) "
+            "in evicted/failed/succeeded states"
+        ),
         func=lazy_load_command('airflow.cli.commands.kubernetes_command.cleanup_pods'),
         args=(ARG_NAMESPACE,),
     ),
diff --git a/airflow/cli/commands/kubernetes_command.py b/airflow/cli/commands/kubernetes_command.py
index f98c45e..daf11a3 100644
--- a/airflow/cli/commands/kubernetes_command.py
+++ b/airflow/cli/commands/kubernetes_command.py
@@ -90,7 +90,23 @@ def cleanup_pods(args):
     print('Loading Kubernetes configuration')
     kube_client = get_kube_client()
     print(f'Listing pods in namespace {namespace}')
-    list_kwargs = {"namespace": namespace, "limit": 500}
+    airflow_pod_labels = [
+        'dag_id',
+        'task_id',
+        'execution_date',
+        'try_number',
+        'airflow_version',
+    ]
+    list_kwargs = {
+        "namespace": namespace,
+        "limit": 500,
+        "label_selector": client.V1LabelSelector(
+            match_expressions=[
+                client.V1LabelSelectorRequirement(key=label, operator="Exists")
+                for label in airflow_pod_labels
+            ]
+        ),
+    }
     while True:  # pylint: disable=too-many-nested-blocks
         pod_list = kube_client.list_namespaced_pod(**list_kwargs)
         for pod in pod_list.items:
diff --git a/tests/cli/commands/test_kubernetes_command.py b/tests/cli/commands/test_kubernetes_command.py
index 8ae2eef..707eb55 100644
--- a/tests/cli/commands/test_kubernetes_command.py
+++ b/tests/cli/commands/test_kubernetes_command.py
@@ -55,6 +55,13 @@ class TestGenerateDagYamlCommand(unittest.TestCase):
 
 
 class TestCleanUpPodsCommand(unittest.TestCase):
+    label_selector = kubernetes.client.V1LabelSelector(
+        match_expressions=[
+            kubernetes.client.V1LabelSelectorRequirement(key=label, operator="Exists")
+            for label in ['dag_id', 'task_id', 'execution_date', 'try_number', 'airflow_version']
+        ]
+    )
+
     @classmethod
     def setUpClass(cls):
         cls.parser = cli_parser.get_parser()
@@ -79,7 +86,9 @@ class TestCleanUpPodsCommand(unittest.TestCase):
         kubernetes_command.cleanup_pods(
             self.parser.parse_args(['kubernetes', 'cleanup-pods', '--namespace', 'awesome-namespace'])
         )
-        list_namespaced_pod.assert_called_once_with(namespace='awesome-namespace', limit=500)
+        list_namespaced_pod.assert_called_once_with(
+            namespace='awesome-namespace', limit=500, label_selector=self.label_selector
+        )
         delete_pod.assert_not_called()
         load_incluster_config.assert_called_once()
 
@@ -98,7 +107,9 @@ class TestCleanUpPodsCommand(unittest.TestCase):
         kubernetes_command.cleanup_pods(
             self.parser.parse_args(['kubernetes', 'cleanup-pods', '--namespace', 'awesome-namespace'])
         )
-        list_namespaced_pod.assert_called_once_with(namespace='awesome-namespace', limit=500)
+        list_namespaced_pod.assert_called_once_with(
+            namespace='awesome-namespace', limit=500, label_selector=self.label_selector
+        )
         delete_pod.assert_called_with('dummy', 'awesome-namespace')
         load_incluster_config.assert_called_once()
 
@@ -120,7 +131,9 @@ class TestCleanUpPodsCommand(unittest.TestCase):
         kubernetes_command.cleanup_pods(
             self.parser.parse_args(['kubernetes', 'cleanup-pods', '--namespace', 'awesome-namespace'])
         )
-        list_namespaced_pod.assert_called_once_with(namespace='awesome-namespace', limit=500)
+        list_namespaced_pod.assert_called_once_with(
+            namespace='awesome-namespace', limit=500, label_selector=self.label_selector
+        )
         delete_pod.assert_not_called()
         load_incluster_config.assert_called_once()
 
@@ -142,7 +155,9 @@ class TestCleanUpPodsCommand(unittest.TestCase):
         kubernetes_command.cleanup_pods(
             self.parser.parse_args(['kubernetes', 'cleanup-pods', '--namespace', 'awesome-namespace'])
         )
-        list_namespaced_pod.assert_called_once_with(namespace='awesome-namespace', limit=500)
+        list_namespaced_pod.assert_called_once_with(
+            namespace='awesome-namespace', limit=500, label_selector=self.label_selector
+        )
         delete_pod.assert_called_with('dummy3', 'awesome-namespace')
         load_incluster_config.assert_called_once()
 
@@ -162,7 +177,9 @@ class TestCleanUpPodsCommand(unittest.TestCase):
         kubernetes_command.cleanup_pods(
             self.parser.parse_args(['kubernetes', 'cleanup-pods', '--namespace', 'awesome-namespace'])
         )
-        list_namespaced_pod.assert_called_once_with(namespace='awesome-namespace', limit=500)
+        list_namespaced_pod.assert_called_once_with(
+            namespace='awesome-namespace', limit=500, label_selector=self.label_selector
+        )
         delete_pod.assert_called_with('dummy4', 'awesome-namespace')
         load_incluster_config.assert_called_once()
 
@@ -182,7 +199,9 @@ class TestCleanUpPodsCommand(unittest.TestCase):
         kubernetes_command.cleanup_pods(
             self.parser.parse_args(['kubernetes', 'cleanup-pods', '--namespace', 'awesome-namespace'])
         )
-        list_namespaced_pod.assert_called_once_with(namespace='awesome-namespace', limit=500)
+        list_namespaced_pod.assert_called_once_with(
+            namespace='awesome-namespace', limit=500, label_selector=self.label_selector
+        )
         load_incluster_config.assert_called_once()
 
     @mock.patch('airflow.cli.commands.kubernetes_command._delete_pod')
@@ -204,8 +223,13 @@ class TestCleanUpPodsCommand(unittest.TestCase):
             self.parser.parse_args(['kubernetes', 'cleanup-pods', '--namespace', 'awesome-namespace'])
         )
         calls = [
-            call.first(namespace='awesome-namespace', limit=500),
-            call.second(namespace='awesome-namespace', limit=500, _continue='dummy-token'),
+            call.first(namespace='awesome-namespace', limit=500, label_selector=self.label_selector),
+            call.second(
+                namespace='awesome-namespace',
+                limit=500,
+                label_selector=self.label_selector,
+                _continue='dummy-token',
+            ),
         ]
         list_namespaced_pod.assert_has_calls(calls)
         delete_pod.assert_called_with('dummy', 'awesome-namespace')

[airflow] 34/36: Change default of `[kubernetes] enable_tcp_keepalive` to `True` (#15338)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit def23a0a3b9b400c5251d2b230735d617c1ab590
Author: Jed Cunningham <66...@users.noreply.github.com>
AuthorDate: Tue Apr 13 08:24:09 2021 -0600

    Change default of `[kubernetes] enable_tcp_keepalive` to `True` (#15338)
    
    We've seen instances of connection resets happening, particularly in
    Azure, that are remedied by enabling tcp_keepalive. Enabling it by
    default should be safe and sane regardless of where we are running.
    
    (cherry picked from commit 6e31465a30dfd17e2e1409a81600b2e83c910036)
---
 airflow/config_templates/config.yml          | 2 +-
 airflow/config_templates/default_airflow.cfg | 2 +-
 airflow/kubernetes/kube_client.py            | 8 ++++----
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/airflow/config_templates/config.yml b/airflow/config_templates/config.yml
index 32694e4..c92acf8 100644
--- a/airflow/config_templates/config.yml
+++ b/airflow/config_templates/config.yml
@@ -2065,7 +2065,7 @@
       version_added: ~
       type: boolean
       example: ~
-      default: "False"
+      default: "True"
     - name: tcp_keep_idle
       description: |
         When the `enable_tcp_keepalive` option is enabled, TCP probes a connection that has
diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg
index 7685457..bc4d54a 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -1014,7 +1014,7 @@ delete_option_kwargs =
 
 # Enables TCP keepalive mechanism. This prevents Kubernetes API requests to hang indefinitely
 # when idle connection is time-outed on services like cloud load balancers or firewalls.
-enable_tcp_keepalive = False
+enable_tcp_keepalive = True
 
 # When the `enable_tcp_keepalive` option is enabled, TCP probes a connection that has
 # been idle for `tcp_keep_idle` seconds.
diff --git a/airflow/kubernetes/kube_client.py b/airflow/kubernetes/kube_client.py
index 7e8c5e8..1e65ae5 100644
--- a/airflow/kubernetes/kube_client.py
+++ b/airflow/kubernetes/kube_client.py
@@ -80,9 +80,9 @@ def _enable_tcp_keepalive() -> None:
 
     from urllib3.connection import HTTPConnection, HTTPSConnection
 
-    tcp_keep_idle = conf.getint('kubernetes', 'tcp_keep_idle', fallback=120)
-    tcp_keep_intvl = conf.getint('kubernetes', 'tcp_keep_intvl', fallback=30)
-    tcp_keep_cnt = conf.getint('kubernetes', 'tcp_keep_cnt', fallback=6)
+    tcp_keep_idle = conf.getint('kubernetes', 'tcp_keep_idle')
+    tcp_keep_intvl = conf.getint('kubernetes', 'tcp_keep_intvl')
+    tcp_keep_cnt = conf.getint('kubernetes', 'tcp_keep_cnt')
 
     socket_options = [
         (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
@@ -120,7 +120,7 @@ def get_kube_client(
         if config_file is None:
             config_file = conf.get('kubernetes', 'config_file', fallback=None)
 
-    if conf.getboolean('kubernetes', 'enable_tcp_keepalive', fallback=False):
+    if conf.getboolean('kubernetes', 'enable_tcp_keepalive'):
         _enable_tcp_keepalive()
 
     client_conf = _get_kube_config(in_cluster, cluster_context, config_file)

[airflow] 32/36: Fix password masking in CLI action_logging (#15143)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 84d305fbc9160d320f45e35887af7a1bad538bc7
Author: Xiaodong DENG <xd...@apache.org>
AuthorDate: Thu Apr 1 23:02:28 2021 +0200

    Fix password masking in CLI action_logging (#15143)
    
    Currently as long as argument '-p' if present, code tries to mask it.
    
    However, '-p' may mean something else (not password), like a boolean flag. Such cases may result in exception
    
    (cherry picked from commit 486b76438c0679682cf98cb88ed39c4b161cbcc8)
---
 airflow/utils/cli.py         | 20 +++++++++++---------
 tests/utils/test_cli_util.py | 10 ++++++++++
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/airflow/utils/cli.py b/airflow/utils/cli.py
index 68a0b44..fc73dfc 100644
--- a/airflow/utils/cli.py
+++ b/airflow/utils/cli.py
@@ -110,17 +110,19 @@ def _build_metrics(func_name, namespace):
     """
     from airflow.models import Log
 
+    sub_commands_to_check = {'users', 'connections'}
     sensitive_fields = {'-p', '--password', '--conn-password'}
     full_command = list(sys.argv)
-    for idx, command in enumerate(full_command):  # pylint: disable=too-many-nested-blocks
-        if command in sensitive_fields:
-            # For cases when password is passed as "--password xyz" (with space between key and value)
-            full_command[idx + 1] = "*" * 8
-        else:
-            # For cases when password is passed as "--password=xyz" (with '=' between key and value)
-            for sensitive_field in sensitive_fields:
-                if command.startswith(f'{sensitive_field}='):
-                    full_command[idx] = f'{sensitive_field}={"*" * 8}'
+    if full_command[1] in sub_commands_to_check:  # pylint: disable=too-many-nested-blocks
+        for idx, command in enumerate(full_command):
+            if command in sensitive_fields:
+                # For cases when password is passed as "--password xyz" (with space between key and value)
+                full_command[idx + 1] = "*" * 8
+            else:
+                # For cases when password is passed as "--password=xyz" (with '=' between key and value)
+                for sensitive_field in sensitive_fields:
+                    if command.startswith(f'{sensitive_field}='):
+                        full_command[idx] = f'{sensitive_field}={"*" * 8}'
 
     metrics = {
         'sub_command': func_name,
diff --git a/tests/utils/test_cli_util.py b/tests/utils/test_cli_util.py
index c567f44..6d88f66 100644
--- a/tests/utils/test_cli_util.py
+++ b/tests/utils/test_cli_util.py
@@ -112,9 +112,19 @@ class TestCliUtil(unittest.TestCase):
                 "airflow connections add dsfs --conn-login asd --conn-password test --conn-type google",
                 "airflow connections add dsfs --conn-login asd --conn-password ******** --conn-type google",
             ),
+            (
+                "airflow scheduler -p",
+                "airflow scheduler -p",
+            ),
+            (
+                "airflow celery flower -p 8888",
+                "airflow celery flower -p 8888",
+            ),
         ]
     )
     def test_cli_create_user_supplied_password_is_masked(self, given_command, expected_masked_command):
+        # '-p' value which is not password, like 'airflow scheduler -p'
+        # or 'airflow celery flower -p 8888', should not be masked
         args = given_command.split()
 
         expected_command = expected_masked_command.split()

[airflow] 29/36: Restore base lineage backend (#14146)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit ac00eab33446171c2776d2822621fa8845a4bfcd
Author: João Ponte <JP...@users.noreply.github.com>
AuthorDate: Sat Apr 3 10:26:59 2021 +0200

    Restore base lineage backend (#14146)
    
    This adds back the base lineage backend which can be extended to send lineage metadata to any custom backend.
    closes: #14106
    
    Co-authored-by: Joao Ponte <jp...@plista.com>
    Co-authored-by: Tomek Urbaszek <tu...@gmail.com>
    (cherry picked from commit af2d11e36ed43b0103a54780640493b8ae46d70e)
---
 airflow/lineage/__init__.py     | 22 ++++++++++++++++++
 airflow/lineage/backend.py      | 47 +++++++++++++++++++++++++++++++++++++++
 docs/apache-airflow/lineage.rst | 21 ++++++++++++++++++
 tests/lineage/test_lineage.py   | 49 ++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/airflow/lineage/__init__.py b/airflow/lineage/__init__.py
index 65f19ef..905eb00 100644
--- a/airflow/lineage/__init__.py
+++ b/airflow/lineage/__init__.py
@@ -25,6 +25,8 @@ import attr
 import jinja2
 from cattr import structure, unstructure
 
+from airflow.configuration import conf
+from airflow.lineage.backend import LineageBackend
 from airflow.utils.module_loading import import_string
 
 ENV = jinja2.Environment()
@@ -45,6 +47,22 @@ class Metadata:
     data: Dict = attr.ib()
 
 
+def get_backend() -> Optional[LineageBackend]:
+    """Gets the lineage backend if defined in the configs"""
+    clazz = conf.getimport("lineage", "backend", fallback=None)
+
+    if clazz:
+        if not issubclass(clazz, LineageBackend):
+            raise TypeError(
+                f"Your custom Lineage class `{clazz.__name__}` "
+                f"is not a subclass of `{LineageBackend.__name__}`."
+            )
+        else:
+            return clazz()
+
+    return None
+
+
 def _get_instance(meta: Metadata):
     """Instantiate an object from Metadata"""
     cls = import_string(meta.type_name)
@@ -82,6 +100,7 @@ def apply_lineage(func: T) -> T:
     Saves the lineage to XCom and if configured to do so sends it
     to the backend.
     """
+    _backend = get_backend()
 
     @wraps(func)
     def wrapper(self, context, *args, **kwargs):
@@ -101,6 +120,9 @@ def apply_lineage(func: T) -> T:
                 context, key=PIPELINE_INLETS, value=inlets, execution_date=context['ti'].execution_date
             )
 
+        if _backend:
+            _backend.send_lineage(operator=self, inlets=self.inlets, outlets=self.outlets, context=context)
+
         return ret_val
 
     return cast(T, wrapper)
diff --git a/airflow/lineage/backend.py b/airflow/lineage/backend.py
new file mode 100644
index 0000000..edfbe0e
--- /dev/null
+++ b/airflow/lineage/backend.py
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Sends lineage metadata to a backend"""
+from typing import TYPE_CHECKING, Optional
+
+if TYPE_CHECKING:
+    from airflow.models.baseoperator import BaseOperator  # pylint: disable=cyclic-import
+
+
+class LineageBackend:
+    """Sends lineage metadata to a backend"""
+
+    def send_lineage(
+        self,
+        operator: 'BaseOperator',
+        inlets: Optional[list] = None,
+        outlets: Optional[list] = None,
+        context: Optional[dict] = None,
+    ):
+        """
+        Sends lineage metadata to a backend
+
+        :param operator: the operator executing a transformation on the inlets and outlets
+        :type operator: airflow.models.baseoperator.BaseOperator
+        :param inlets: the inlets to this operator
+        :type inlets: list
+        :param outlets: the outlets from this operator
+        :type outlets: list
+        :param context: the current context of the task instance
+        :type context: dict
+        """
+        raise NotImplementedError()
diff --git a/docs/apache-airflow/lineage.rst b/docs/apache-airflow/lineage.rst
index a29f042..362d3e6 100644
--- a/docs/apache-airflow/lineage.rst
+++ b/docs/apache-airflow/lineage.rst
@@ -95,3 +95,24 @@ has outlets defined (e.g. by using ``add_outlets(..)`` or has out of the box sup
     f_in > run_this | (run_this_last > outlets)
 
 .. _precedence: https://docs.python.org/3/reference/expressions.html
+
+
+Lineage Backend
+---------------
+
+It's possible to push the lineage metrics to a custom backend by providing an instance of a LinageBackend in the config:
+
+.. code-block:: ini
+
+  [lineage]
+  backend = my.lineage.CustomBackend
+
+The backend should inherit from ``airflow.lineage.LineageBackend``.
+
+.. code-block:: python
+
+  from airflow.lineage.backend import LineageBackend
+
+  class ExampleBackend(LineageBackend):
+    def send_lineage(self, operator, inlets=None, outlets=None, context=None):
+      # Send the info to some external service
diff --git a/tests/lineage/test_lineage.py b/tests/lineage/test_lineage.py
index 350a8be..b5ebbea 100644
--- a/tests/lineage/test_lineage.py
+++ b/tests/lineage/test_lineage.py
@@ -16,16 +16,24 @@
 # specific language governing permissions and limitations
 # under the License.
 import unittest
+from unittest import mock
 
-from airflow.lineage import AUTO
+from airflow.lineage import AUTO, apply_lineage, get_backend, prepare_lineage
+from airflow.lineage.backend import LineageBackend
 from airflow.lineage.entities import File
 from airflow.models import DAG, TaskInstance as TI
 from airflow.operators.dummy import DummyOperator
 from airflow.utils import timezone
+from tests.test_utils.config import conf_vars
 
 DEFAULT_DATE = timezone.datetime(2016, 1, 1)
 
 
+class CustomLineageBackend(LineageBackend):
+    def send_lineage(self, operator, inlets=None, outlets=None, context=None):
+        pass
+
+
 class TestLineage(unittest.TestCase):
     def test_lineage(self):
         dag = DAG(dag_id='test_prepare_lineage', start_date=DEFAULT_DATE)
@@ -111,3 +119,42 @@ class TestLineage(unittest.TestCase):
         op1.pre_execute(ctx1)
         assert op1.inlets[0].url == f1s.format(DEFAULT_DATE)
         assert op1.outlets[0].url == f1s.format(DEFAULT_DATE)
+
+    @mock.patch("airflow.lineage.get_backend")
+    def test_lineage_is_sent_to_backend(self, mock_get_backend):
+        class TestBackend(LineageBackend):
+            def send_lineage(self, operator, inlets=None, outlets=None, context=None):
+                assert len(inlets) == 1
+                assert len(outlets) == 1
+
+        func = mock.Mock()
+        func.__name__ = 'foo'
+
+        mock_get_backend.return_value = TestBackend()
+
+        dag = DAG(dag_id='test_lineage_is_sent_to_backend', start_date=DEFAULT_DATE)
+
+        with dag:
+            op1 = DummyOperator(task_id='task1')
+
+        file1 = File("/tmp/some_file")
+
+        op1.inlets.append(file1)
+        op1.outlets.append(file1)
+
+        ctx1 = {"ti": TI(task=op1, execution_date=DEFAULT_DATE), "execution_date": DEFAULT_DATE}
+
+        prep = prepare_lineage(func)
+        prep(op1, ctx1)
+        post = apply_lineage(func)
+        post(op1, ctx1)
+
+    def test_empty_lineage_backend(self):
+        backend = get_backend()
+        assert backend is None
+
+    @conf_vars({("lineage", "backend"): "tests.lineage.test_lineage.CustomLineageBackend"})
+    def test_resolve_lineage_class(self):
+        backend = get_backend()
+        assert issubclass(backend.__class__, LineageBackend)
+        assert isinstance(backend, CustomLineageBackend)

[airflow] 20/36: Sort Committers via their names instead of usernames (#14403)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 8d902cb56003aeefdf0b2a5c6cb9386ad42f8571
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Wed Feb 24 06:00:55 2021 +0000

    Sort Committers via their names instead of usernames (#14403)
    
    Previously:
    
    GithubUsername (Name)
    
    Now:
    
    Name (GithubUsername)
    
    (cherry picked from commit 3dc762c8177264001793e20543c24c6414c14960)
---
 docs/apache-airflow/project.rst | 70 ++++++++++++++++++++---------------------
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/docs/apache-airflow/project.rst b/docs/apache-airflow/project.rst
index 216b927..801d4c0 100644
--- a/docs/apache-airflow/project.rst
+++ b/docs/apache-airflow/project.rst
@@ -36,41 +36,41 @@ in January 2019.
 Committers
 ----------
 
-- @aijamalnk (Aizhamal Nurmamat kyzy)
-- @alexvanboxel (Alex Van Boxel)
-- @aoen (Dan Davydov)
-- @artwr (Arthur Wiedmer)
-- @ashb (Ash Berlin-Taylor)
-- @basph (Bas Harenslak)
-- @bolkedebruin (Bolke de Bruin)
-- @criccomini (Chris Riccomini)
-- @dimberman (Daniel Imberman)
-- @ephraimbuddy (Ephraim Anierobi)
-- @feluelle (Felix Uellendall)
-- @feng-tao (Tao Feng)
-- @fokko (Fokko Driesprong)
-- @hiteshs (Hitesh Shah)
-- @houqp (Qingping Hou)
-- @jghoman (Jakob Homan)
-- @jmcarp (Joshua Carp)
-- @joygao (Joy Gao)
-- @kaxil (Kaxil Naik)
-- @KevinYang21 (Kevin Yang)
-- @leahecole (Leah Cole)
-- @mik-laj (Kamil Breguła)
-- @milton0825 (Chao-Han Tsai)
-- @mistercrunch (Maxime "Max" Beauchemin)
-- @msumit (Sumit Maheshwari)
-- @potiuk (Jarek Potiuk)
-- @r39132 (Siddharth "Sid" Anand)
-- @ryanahamilton (Ryan Hamilton)
-- @ryw (Ry Walker)
-- @saguziel (Alex Guziel)
-- @sekikn (Kengo Seki)
-- @turbaszek (Tomasz Urbaszek)
-- @vikramkoka (Vikram Koka)
-- @XD-DENG (Xiaodong Deng)
-- @zhongjiajie (Jiajie Zhong)
+- Aizhamal Nurmamat kyzy (@aijamalnk)
+- Alex Guziel (@saguziel)
+- Alex Van Boxel (@alexvanboxel)
+- Arthur Wiedmer (@artwr)
+- Ash Berlin-Taylor (@ashb)
+- Bas Harenslak (@basph)
+- Bolke de Bruin (@bolkedebruin)
+- Chao-Han Tsai (@milton0825)
+- Chris Riccomini (@criccomini)
+- Dan Davydov (@aoen)
+- Daniel Imberman (@dimberman)
+- Ephraim Anierobi (@ephraimbuddy)
+- Felix Uellendall (@feluelle)
+- Fokko Driesprong (@fokko)
+- Hitesh Shah (@hiteshs)
+- Jakob Homan (@jghoman)
+- Jarek Potiuk (@potiuk)
+- Jiajie Zhong (@zhongjiajie)
+- Joshua Carp (@jmcarp)
+- Joy Gao (@joygao)
+- Kamil Breguła (@mik-laj)
+- Kaxil Naik (@kaxil)
+- Kengo Seki (@sekikn)
+- Kevin Yang (@KevinYang21)
+- Leah Cole (@leahecole)
+- Maxime "Max" Beauchemin (@mistercrunch)
+- Qingping Hou (@houqp)
+- Ry Walker (@ryw)
+- Ryan Hamilton (@ryanahamilton)
+- Siddharth "Sid" Anand (@r39132)
+- Sumit Maheshwari (@msumit)
+- Tao Feng (@feng-tao)
+- Tomasz Urbaszek (@turbaszek)
+- Vikram Koka (@vikramkoka)
+- Xiaodong Deng (@XD-DENG)
 
 For the full list of contributors, take a look at `Airflow's GitHub
 Contributor page:

[airflow] 21/36: Add new committers (#14544)

Posted by as...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

ash pushed a commit to branch v2-0-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 4ce74f80b61ff15439e23c8bb2f223cdcbc36025
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Mon Mar 1 17:29:30 2021 +0000

    Add new committers (#14544)
    
    https://lists.apache.org/thread.html/r33d43764cfb4a3a5f8e463c543229de3f13ee86a9713e7263ef34d39%40%3Cdev.airflow.apache.org%3E
    (cherry picked from commit eee48761693911ac0aceae21d4c7d6cce36ad5ba)
---
 docs/apache-airflow/project.rst | 3 +++
 docs/spelling_wordlist.txt      | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/docs/apache-airflow/project.rst b/docs/apache-airflow/project.rst
index 801d4c0..bd5bb37 100644
--- a/docs/apache-airflow/project.rst
+++ b/docs/apache-airflow/project.rst
@@ -47,11 +47,14 @@ Committers
 - Chris Riccomini (@criccomini)
 - Dan Davydov (@aoen)
 - Daniel Imberman (@dimberman)
+- Daniel Standish (@dstandish)
+- Elad Kalif (@eladkal)
 - Ephraim Anierobi (@ephraimbuddy)
 - Felix Uellendall (@feluelle)
 - Fokko Driesprong (@fokko)
 - Hitesh Shah (@hiteshs)
 - Jakob Homan (@jghoman)
+- James Timmins (@jhtimmins)
 - Jarek Potiuk (@potiuk)
 - Jiajie Zhong (@zhongjiajie)
 - Joshua Carp (@jmcarp)
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
index 34506e9..784c1bd 100644
--- a/docs/spelling_wordlist.txt
+++ b/docs/spelling_wordlist.txt
@@ -123,6 +123,7 @@ Dynamodb
 EDITMSG
 ETag
 Eg
+Elad
 EmrAddSteps
 EmrCreateJobFlow
 Enum
@@ -207,6 +208,7 @@ Json
 Jupyter
 KYLIN
 Kalibrr
+Kalif
 Kamil
 Kaxil
 Kengo
@@ -341,6 +343,7 @@ Sqlite
 Sqoop
 Stackdriver
 Standarization
+Standish
 StatsD
 Statsd
 StoredInfoType
@@ -367,6 +370,7 @@ Terraform
 TextToSpeechClient
 Tez
 Thinknear
+Timmins
 ToC
 Tomasz
 Tooltip
@@ -690,6 +694,7 @@ dropdown
 druidHook
 ds
 dsn
+dstandish
 dttm
 dtypes
 durations
@@ -701,6 +706,7 @@ ec
 ecb
 editorconfig
 eg
+eladkal
 elasticsearch
 emr
 enableAutoScale
@@ -881,6 +887,7 @@ jdbc
 jdk
 jenkins
 jghoman
+jhtimmins
 jinja
 jira
 jitter