You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/05 14:34:37 UTC

[GitHub] [airflow] potiuk opened a new pull request #13488: Introduces separate runtime provider schema

potiuk opened a new pull request #13488:
URL: https://github.com/apache/airflow/pull/13488


   The provider.yaml contains more information that required at
   runtime (specifically about documentation building). Those
   fields are not needed at runtime and their presence is optional.
   Also the runtime check for provider information should be more
   relexed and allow for future compatibility (with
   additional properties set to false). This way we can add new,
   optional fields to provider.yaml without worrying about breaking
   future-compatibility of providers with future airflow versions.
   
   This changei restores 'additionalProperties': false in the
   main, development-focused provider.yaml schema and introduced
   new runtime schema that is used to verify the provider info when
   providers are discovered by airflow.
   
   This 'runtime' version should change very rarely as change to
   add a new required property in it breaks compatibility of
   providers with already released versions of Airflow.
   
   We also trim-down the provider.yaml file when preparing provider
   packages to only contain those fields that are required in the
   runtime schema.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-757339139


   [The Workflow run](https://github.com/apache/airflow/actions/runs/474339957) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-754673117


   cc: @mik-laj  - for some reason I cannot add you as reviewer? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-757327214


   [The Workflow run](https://github.com/apache/airflow/actions/runs/474241021) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r553060489



##########
File path: dev/provider_packages/prepare_provider_packages.py
##########
@@ -1152,39 +1156,78 @@ def _load_schema() -> Dict[str, Any]:
     except jsonschema.ValidationError as e:
         raise Exception(
             "Error when validating schema. The schema must be Airflow 2.0.0 compatible. "
-            "If you added any fields please remove them via 'remove_extra_fields' method.",
+            "If you added any fields please remove them via 'convert_to_provider_info' method.",
             e,
         )
 
 
-def remove_logo_field(original_provider_info: Dict[str, Any]):
-    updated_provider_info = deepcopy(original_provider_info)
-    expression = jsonpath_ng.parse("integrations..logo")
-    updated_provider_info = expression.filter(lambda x: True, updated_provider_info)
-    return updated_provider_info
+def validate_provider_info_with_runtime_schema(provider_info: Dict[str, Any]) -> None:
+    """
+    Validates provider info against the runtime schema. This way we check if the provider info in the
+    packages is future-compatible. The Runtime Schema should only change when there is a major version
+    change.
+
+    :param provider_info: provider info to validate
+    """
+
+    def _load_schema() -> Dict[str, Any]:
+        with open(PROVIDER_RUNTIME_DATA_SCHEMA_PATH) as schema_file:
+            content = json.load(schema_file)
+        return content

Review comment:
       I really think a function is not required here:
   
   ```suggestion
       with open(PROVIDER_RUNTIME_DATA_SCHEMA_PATH) as schema_file:
           content = json.load(schema_file)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r554234690



##########
File path: scripts/in_container/run_prepare_provider_packages.sh
##########
@@ -196,9 +197,9 @@ function rename_packages_if_needed() {
         fi
     fi
 
-    popd
+    popd >/dev/null
     echo
-    echo "Airflow packages are in dist folder "
+    echo "${COLOR_GREEN_OK} Airflow packages are prepared in dist folder${COLOR_RESET}"

Review comment:
       ```suggestion
       echo "${COLOR_GREEN}OK Airflow packages are prepared in dist folder${COLOR_RESET}"
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r553247684



##########
File path: dev/provider_packages/prepare_provider_packages.py
##########
@@ -1152,39 +1156,78 @@ def _load_schema() -> Dict[str, Any]:
     except jsonschema.ValidationError as e:
         raise Exception(
             "Error when validating schema. The schema must be Airflow 2.0.0 compatible. "
-            "If you added any fields please remove them via 'remove_extra_fields' method.",
+            "If you added any fields please remove them via 'convert_to_provider_info' method.",
             e,
         )
 
 
-def remove_logo_field(original_provider_info: Dict[str, Any]):
-    updated_provider_info = deepcopy(original_provider_info)
-    expression = jsonpath_ng.parse("integrations..logo")
-    updated_provider_info = expression.filter(lambda x: True, updated_provider_info)
-    return updated_provider_info
+def validate_provider_info_with_runtime_schema(provider_info: Dict[str, Any]) -> None:
+    """
+    Validates provider info against the runtime schema. This way we check if the provider info in the
+    packages is future-compatible. The Runtime Schema should only change when there is a major version
+    change.
+
+    :param provider_info: provider info to validate
+    """
+
+    def _load_schema() -> Dict[str, Any]:
+        with open(PROVIDER_RUNTIME_DATA_SCHEMA_PATH) as schema_file:
+            content = json.load(schema_file)
+        return content

Review comment:
       Sure. Copied it from `doc/utils` but yeah inlining makes perfect sense here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r554235409



##########
File path: scripts/in_container/run_prepare_provider_packages.sh
##########
@@ -167,7 +167,8 @@ function build_provider_packages() {
             cat "${LOG_FILE}"
             exit "${RES}"
         fi
-        echo " Prepared ${PACKAGE_TYPE} package ${PROVIDER_PACKAGE} format ${PACKAGE_FORMAT}"
+        echo "==================================================================================="
+        echo "${COLOR_GREEN_OK} Prepared ${PACKAGE_TYPE} package ${PROVIDER_PACKAGE} format ${PACKAGE_FORMAT}${COLOR_RESET}"

Review comment:
       ```suggestion
           echo "${COLOR_GREEN}OK Prepared ${PACKAGE_TYPE} package ${PROVIDER_PACKAGE} format ${PACKAGE_FORMAT}${COLOR_RESET}"
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-757468519


   @kaxil -> all fixed and only quarantined tests failing 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk removed a comment on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk removed a comment on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-756420225


   Hey @ashb  - any news about the self-hosted runner changes :). I think we badly, badly need it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-757036745






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r553246189



##########
File path: airflow/providers_manager.py
##########
@@ -34,14 +34,14 @@
     import importlib.resources as importlib_resources
 except ImportError:
     # Try back-ported to PY<37 `importlib_resources`.
-    import importlib_resources
+    import importlib_resources  # noqa

Review comment:
       The MyPy real-time scan integration complains about it in the IDE. Not sure why, but I guess it has something to do with the weird treatment of importlib_resources for different python versions. Adding # noqa is safe here and solves the problem:
   
   ![Screenshot from 2021-01-07 11-35-22](https://user-images.githubusercontent.com/595491/103882694-7b80d500-50dc-11eb-9c28-08a1012d2206.png)
   
   It's my quest to have a green checkmark for all python files.
   
   ![Screenshot from 2021-01-07 11-36-19](https://user-images.githubusercontent.com/595491/103882773-95221c80-50dc-11eb-8c8a-192e903b1dbf.png)
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r551975860



##########
File path: MANIFEST.in
##########
@@ -32,7 +32,7 @@ global-exclude __pycache__  *.pyc
 include airflow/alembic.ini
 include airflow/api_connexion/openapi/v1.yaml
 include airflow/git_version
-include airflow/provider.yaml.schema.json
+include airflow/provider.runtime.yaml.schema.json

Review comment:
       Would we "invert" this naming? the only one we ship in dist should be the runtime, and the current rename `provider.yaml.schema.json` to something like `provider.development.yaml.schema.json`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r554440949



##########
File path: scripts/in_container/run_prepare_provider_packages.sh
##########
@@ -196,9 +197,9 @@ function rename_packages_if_needed() {
         fi
     fi
 
-    popd
+    popd >/dev/null
     echo
-    echo "Airflow packages are in dist folder "
+    echo "${COLOR_GREEN_OK} Airflow packages are prepared in dist folder${COLOR_RESET}"

Review comment:
       Nohing to apologise for :).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-757036745


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r552802339



##########
File path: airflow/provider_info.schema.json
##########
@@ -0,0 +1,45 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "type": "object",
+  "properties": {
+    "package-name": {
+      "description": "Package name available under which the package is available in the PyPI repository.",
+      "type": "string"
+    },
+    "name": {
+      "description": "Provider name",
+      "type": "string"
+    },
+    "description": {
+      "description": "Information about the package in RST format",
+      "type": "string"
+    },
+    "versions": {
+      "description": "List of available versions in Pypi. Sorted descending according to release date.",
+      "type": "array",
+      "items": {
+        "type": "string"
+      }
+    },

Review comment:
       Good point. It does not have to be. I will remove!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-754675637


   As discussed before - the #13488 with separate runtime schema. I think it is cleanest and best approach. I very much like the idea that all provider info is in one place and then we can split out only the information that is need at runtime. This is way better than splitting off the files and much more logical approach IMHO, especially that we can do the validation and that some information (like versions, package name, description ) are shared between runtime and doc.
   
   I am not sure even why we would want to split them in this case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-756952352


   @kaxil @ashb -> ready to go :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-756952352


   @kaxil @ashb -> ready to go :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r554235494



##########
File path: scripts/in_container/run_prepare_provider_packages.sh
##########
@@ -167,7 +167,8 @@ function build_provider_packages() {
             cat "${LOG_FILE}"
             exit "${RES}"
         fi
-        echo " Prepared ${PACKAGE_TYPE} package ${PROVIDER_PACKAGE} format ${PACKAGE_FORMAT}"
+        echo "==================================================================================="
+        echo "${COLOR_GREEN_OK} Prepared ${PACKAGE_TYPE} package ${PROVIDER_PACKAGE} format ${PACKAGE_FORMAT}${COLOR_RESET}"

Review comment:
       Feel free to discard review after making these changes




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r553059874



##########
File path: airflow/providers_manager.py
##########
@@ -34,14 +34,14 @@
     import importlib.resources as importlib_resources
 except ImportError:
     # Try back-ported to PY<37 `importlib_resources`.
-    import importlib_resources
+    import importlib_resources  # noqa

Review comment:
       why do we need this now -- since it was not present earlier?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-756478663


   Looks like it's going to be green (except quarantined build) looking forward to approval!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r554234690



##########
File path: scripts/in_container/run_prepare_provider_packages.sh
##########
@@ -196,9 +197,9 @@ function rename_packages_if_needed() {
         fi
     fi
 
-    popd
+    popd >/dev/null
     echo
-    echo "Airflow packages are in dist folder "
+    echo "${COLOR_GREEN_OK} Airflow packages are prepared in dist folder${COLOR_RESET}"

Review comment:
       ```suggestion
       echo "${COLOR_GREEN}OK Airflow packages are prepared in dist folder${COLOR_RESET}"
   ```

##########
File path: scripts/in_container/run_prepare_provider_packages.sh
##########
@@ -196,9 +197,9 @@ function rename_packages_if_needed() {
         fi
     fi
 
-    popd
+    popd >/dev/null
     echo
-    echo "Airflow packages are in dist folder "
+    echo "${COLOR_GREEN_OK} Airflow packages are prepared in dist folder${COLOR_RESET}"

Review comment:
       Apologies, this is needed coz I just merged https://github.com/apache/airflow/commit/9d6c7487dd03a42d1b6a1f19fae0c9a746139dbe

##########
File path: scripts/in_container/run_prepare_provider_packages.sh
##########
@@ -167,7 +167,8 @@ function build_provider_packages() {
             cat "${LOG_FILE}"
             exit "${RES}"
         fi
-        echo " Prepared ${PACKAGE_TYPE} package ${PROVIDER_PACKAGE} format ${PACKAGE_FORMAT}"
+        echo "==================================================================================="
+        echo "${COLOR_GREEN_OK} Prepared ${PACKAGE_TYPE} package ${PROVIDER_PACKAGE} format ${PACKAGE_FORMAT}${COLOR_RESET}"

Review comment:
       ```suggestion
           echo "${COLOR_GREEN}OK Prepared ${PACKAGE_TYPE} package ${PROVIDER_PACKAGE} format ${PACKAGE_FORMAT}${COLOR_RESET}"
   ```

##########
File path: scripts/in_container/run_prepare_provider_packages.sh
##########
@@ -167,7 +167,8 @@ function build_provider_packages() {
             cat "${LOG_FILE}"
             exit "${RES}"
         fi
-        echo " Prepared ${PACKAGE_TYPE} package ${PROVIDER_PACKAGE} format ${PACKAGE_FORMAT}"
+        echo "==================================================================================="
+        echo "${COLOR_GREEN_OK} Prepared ${PACKAGE_TYPE} package ${PROVIDER_PACKAGE} format ${PACKAGE_FORMAT}${COLOR_RESET}"

Review comment:
       Feel free to discard review after making these changes




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r551999206



##########
File path: MANIFEST.in
##########
@@ -32,7 +32,7 @@ global-exclude __pycache__  *.pyc
 include airflow/alembic.ini
 include airflow/api_connexion/openapi/v1.yaml
 include airflow/git_version
-include airflow/provider.yaml.schema.json
+include airflow/provider.runtime.yaml.schema.json

Review comment:
       I thought about it but I'd rather keep the name 'provider.yaml.schema.json` because all the IDE tools (and people matching them) would rather use provider.yaml.json.schema to edit provider.yaml. And in fact provider.yaml is already development-only. There are no provider.yaml files beside the development.
   
   But  this thought leads me to another, better naming. The `provider.runtime.yaml.schama.json' shoudl be renamed `provider_info.schema.json` . This would be 1-1 with the entry-point name and there should be no yaml, because there is no yaml whatsoever n provider_info entrypoint. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r553061368



##########
File path: docs/apache-airflow-providers/index.rst
##########
@@ -189,8 +189,11 @@ You need to do the following to turn an existing Python package into a provider
 * Add the ``apache_airflow_provider`` entry point in the ``setup.cfg`` - this tells airflow where to get
   the required provider metadata
 * Create the function that you refer to in the first step as part of your package: this functions returns a
-  dictionary that contains all meta-data about your provider package; see also ``provider.yaml``
-  files in the community managed provider packages as examples
+  dictionary that contains all meta-data about your provider package
+* note that the dictionary should be compliant with ``airflow/provider_info.schema.json`` JSON-schema
+  specification, and the community-managed providers have more fields there that are used to build

Review comment:
       ```suggestion
     specification and the community-managed providers have more fields than that is used to build
   ```
   
   I am assuming this is what you meant but feel free to ignore if not




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-757339250


   [The Workflow run](https://github.com/apache/airflow/actions/runs/474357271) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-756198786


   [The Workflow run](https://github.com/apache/airflow/actions/runs/469120334) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r554234890



##########
File path: scripts/in_container/run_prepare_provider_packages.sh
##########
@@ -196,9 +197,9 @@ function rename_packages_if_needed() {
         fi
     fi
 
-    popd
+    popd >/dev/null
     echo
-    echo "Airflow packages are in dist folder "
+    echo "${COLOR_GREEN_OK} Airflow packages are prepared in dist folder${COLOR_RESET}"

Review comment:
       Apologies, this is needed coz I just merged https://github.com/apache/airflow/commit/9d6c7487dd03a42d1b6a1f19fae0c9a746139dbe




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-756420225


   Hey @ashb  - any news about the self-hosted runner changes :). I think we badly, badly need it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-756036757


   All comments addressed. I also added more colors in the output with 'rich' to see more clearly errors/OK status.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r552004447



##########
File path: MANIFEST.in
##########
@@ -32,7 +32,7 @@ global-exclude __pycache__  *.pyc
 include airflow/alembic.ini
 include airflow/api_connexion/openapi/v1.yaml
 include airflow/git_version
-include airflow/provider.yaml.schema.json
+include airflow/provider.runtime.yaml.schema.json

Review comment:
       Yeah. Just pushed it. This is much better - and there is not a single mention of provider.yaml in the user-facing documentation. Also airflow package does not contain provider.yaml schema nor provider.yaml files, so this leaves provider.yaml and provider.yaml.schema.json as development-only-things, as it should be.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r553059565



##########
File path: airflow/provider_info.schema.json
##########
@@ -0,0 +1,45 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "type": "object",
+  "properties": {
+    "package-name": {
+      "description": "Package name available under which the package is available in the PyPI repository.",
+      "type": "string"
+    },
+    "name": {
+      "description": "Provider name",
+      "type": "string"
+    },
+    "description": {
+      "description": "Information about the package in RST format",
+      "type": "string"
+    },
+    "versions": {
+      "description": "List of available versions in Pypi. Sorted descending according to release date.",

Review comment:
       ```suggestion
         "description": "List of available versions in PyPI. Sorted descending according to release date.",
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #13488:
URL: https://github.com/apache/airflow/pull/13488


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r552239985



##########
File path: airflow/provider_info.schema.json
##########
@@ -0,0 +1,45 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "type": "object",
+  "properties": {
+    "package-name": {
+      "description": "Package name available under which the package is available in the PyPI repository.",
+      "type": "string"
+    },
+    "name": {
+      "description": "Provider name",
+      "type": "string"
+    },
+    "description": {
+      "description": "Information about the package in RST format",
+      "type": "string"
+    },
+    "versions": {
+      "description": "List of available versions in Pypi. Sorted descending according to release date.",
+      "type": "array",
+      "items": {
+        "type": "string"
+      }
+    },

Review comment:
       I thought this wasn't part of get_provider_info?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk removed a comment on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk removed a comment on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-754675637


   As discussed before - the #13488 with separate runtime schema. I think it is cleanest and best approach. I very much like the idea that all provider info is in one place and then we can split out only the information that is need at runtime. This is way better than splitting off the files and much more logical approach IMHO, especially that we can do the validation and that some information (like versions, package name, description ) are shared between runtime and doc.
   
   I am not sure even why we would want to split them in this case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r553060643



##########
File path: dev/provider_packages/prepare_provider_packages.py
##########
@@ -1152,39 +1156,78 @@ def _load_schema() -> Dict[str, Any]:
     except jsonschema.ValidationError as e:
         raise Exception(
             "Error when validating schema. The schema must be Airflow 2.0.0 compatible. "
-            "If you added any fields please remove them via 'remove_extra_fields' method.",
+            "If you added any fields please remove them via 'convert_to_provider_info' method.",
             e,
         )
 
 
-def remove_logo_field(original_provider_info: Dict[str, Any]):
-    updated_provider_info = deepcopy(original_provider_info)
-    expression = jsonpath_ng.parse("integrations..logo")
-    updated_provider_info = expression.filter(lambda x: True, updated_provider_info)
-    return updated_provider_info
+def validate_provider_info_with_runtime_schema(provider_info: Dict[str, Any]) -> None:
+    """
+    Validates provider info against the runtime schema. This way we check if the provider info in the
+    packages is future-compatible. The Runtime Schema should only change when there is a major version
+    change.
+
+    :param provider_info: provider info to validate
+    """
+
+    def _load_schema() -> Dict[str, Any]:
+        with open(PROVIDER_RUNTIME_DATA_SCHEMA_PATH) as schema_file:
+            content = json.load(schema_file)
+        return content
+
+    schema = _load_schema()

Review comment:
       ```suggestion
   ```

##########
File path: dev/provider_packages/prepare_provider_packages.py
##########
@@ -1152,39 +1156,78 @@ def _load_schema() -> Dict[str, Any]:
     except jsonschema.ValidationError as e:
         raise Exception(
             "Error when validating schema. The schema must be Airflow 2.0.0 compatible. "
-            "If you added any fields please remove them via 'remove_extra_fields' method.",
+            "If you added any fields please remove them via 'convert_to_provider_info' method.",
             e,
         )
 
 
-def remove_logo_field(original_provider_info: Dict[str, Any]):
-    updated_provider_info = deepcopy(original_provider_info)
-    expression = jsonpath_ng.parse("integrations..logo")
-    updated_provider_info = expression.filter(lambda x: True, updated_provider_info)
-    return updated_provider_info
+def validate_provider_info_with_runtime_schema(provider_info: Dict[str, Any]) -> None:
+    """
+    Validates provider info against the runtime schema. This way we check if the provider info in the
+    packages is future-compatible. The Runtime Schema should only change when there is a major version
+    change.
+
+    :param provider_info: provider info to validate
+    """
+
+    def _load_schema() -> Dict[str, Any]:
+        with open(PROVIDER_RUNTIME_DATA_SCHEMA_PATH) as schema_file:
+            content = json.load(schema_file)
+        return content

Review comment:
       I really think a function is not required here:
   
   ```suggestion
       with open(PROVIDER_RUNTIME_DATA_SCHEMA_PATH) as schema_file:
           schema = json.load(schema_file)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#discussion_r554440949



##########
File path: scripts/in_container/run_prepare_provider_packages.sh
##########
@@ -196,9 +197,9 @@ function rename_packages_if_needed() {
         fi
     fi
 
-    popd
+    popd >/dev/null
     echo
-    echo "Airflow packages are in dist folder "
+    echo "${COLOR_GREEN_OK} Airflow packages are prepared in dist folder${COLOR_RESET}"

Review comment:
       Nohing to apologise for :).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13488: Introduces separate runtime provider schema

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13488:
URL: https://github.com/apache/airflow/pull/13488#issuecomment-757932439


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org