You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/02 16:21:45 UTC

[GitHub] [airflow] potiuk opened a new pull request #13440: Additional properties should be allowed in provider schema

potiuk opened a new pull request #13440:
URL: https://github.com/apache/airflow/pull/13440


   The additional properties should be allowed in provider schema,
   otherwise future version of providers will not be compatible with
   older versions of Airflow.
   
   Specifying 'additionalProperties' as allowed we are opening up to
   adding more properties to provider.yaml.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754697726


   > We can look at the folders under docs-arcivie -- i.e. do a listdir on https://github.com/apache/airflow-site/tree/master/docs-archive/apache-airflow-providers-apache-hive.
   
   Yes and no. That would require that every time we want to build documentation, we have that two repositories available. This is quite problematic because it would require some extra work from the user and the repository is quite large. Additionally, it would make the process of publishing a new package more difficult, as you would have to follow 4 steps instead of 2.
   1. build documentation for all new packages. 
   2. copy files to `airflow-site` repository.
   3. build documentation with the package list.
   4. Copy files for one documentation package only.
   This adds so much complexity that I already preferred to keep this information in provider.yaml.  This is a much lower cost than maintaining a documentation build process that requires two repositories.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754585936


   Thinking about this more, I'm not sure we even need the "doc build only" information in these files -- we have `docs/apache-airflow-providers-$x/` where we could store this information.
   
   Or it could live in the airflow-site repo. My thinking there is that it is only used for building docs, so having to make extra commits in the apache/airflow monorepo seems un-necessary given airflow-site is what needs that information.
   
   Happy to make these changes myself if you both agree with this approach @mik-laj @potiuk.
   
   In summary:
   
   -Have providers.yaml file only be what is needed at runtime for Airflow
   - Move the information used to build the docs/site to somewhere in to the airflow-site repo (versions, logo, etc)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754674602


   > Which information must live in the providers folder? I'm not talking about removing the provider.yaml, just removing the fields from it that are only used for building the site (i.e. at least logo, versions.)
   
   The logo has been added to this repository to make it easier to update the website. Now as new integration is added it is easy to add a logo as well.  In the next step, when the documentation is published, the website is also updated automatically.  I hope that soon each integration will have a logo and will therefore be promoted on the website as soon as it is published  The previous website could be out of date for a very long time because it was a very tedious and long process, and now each contributor can only do one integration.
   See: https://github.com/apache/airflow/issues/13110
   
   As for the versions, deleting them will also be problematic, because it is used to generate the package list  
   http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/apache-airflow-providers/packages-ref.html
   Deleting this field will mean that we need to figure out a way to get the version information from airflow-site in this repository.  Currently, we only sync information from airflow to airflow-site. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753632936


   All seems to be working :) https://github.com/apache/airflow/pull/13440/checks?check_run_id=1639652121


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753879145


   > I am using jsonpath-ng for that - very neat library that implements jsonpath and we can easily specify the path to filter out from Dictionary object (in case of logo simply integrations..logo does the trick
   
   This can cause some problems. Although it is a devel dependency, these dependencies appear in the constraints.txt file, which may make installing a new version of this library difficult in the future.  Google has a slightly different policy on adding dependencies to their libraries, which seems sensible to apply to our project as well, as in some cases this is how our project should be treated.
   
   > The feature must not add unnecessary dependencies (where "unnecessary" is of course subjective, but new dependencies should be discussed).
   
   https://github.com/googleapis/python-bigquery/blob/master/CONTRIBUTING.rst
   
   Another example is the `beautifulsoup4` library, which even has version requirements in `setup.py`
   https://github.com/apache/airflow/blob/4437137effd2bedc3f9c23391d06316480aa3727/setup.py#L451
   As a result, the user may have to use a version that was released 1 year, 11 months ago - 4.7.1.
   https://github.com/apache/airflow/blob/38fbcadf2f443d5bf17e0813f629b39f5627d13e/constraints-3.8.txt#L80
   Latest version is 4.9.3 (3 months ago)
   
   However, we can address this issue when we get a report about it. I just wanted to warn you against adding development dependencies if it's not necessary.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754606737


   The problem with this information that it is version dependendent and ti MUST be in the provider folder. This information will change over time, when we add new folders, packages etc. We actually even run pre-commit checks that verify if the code is consistent with this information. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753493964


   I've found a potential problem with providers.yaml and schema. Seems that providers that next wave of providers we are going to release will not be compatible with 2.0.0 because they already have 'logo' added as additional property in yaml and providers will not be able to register themselves (schema will fail validation)
   
   Changing additionalProeperties to "true" solves the problem in 'forward-compatible' way, however this does not help for 2.0.0 users who won't be able to discover future providers.
   
   Taking into account, however that 2.0.0 has a number of problems and 2.0.1 is going to be really the first 'stable` release I think, it might make sense to add >=2.0.1 limitation in the future providers (I guess that might motivate users of 2.0.0 to upgrade to 2.0.1). We could also yank 2.0.0 release (I think we should do it regardless with the number of small, but annoying problems we have). 
   
   Let me know WDYT.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753493964


   I've found a potential problem with providers.yaml and schema. Seems that providers that next wave of providers we are going to release will not be compatible with 2.0.0 because they already have 'logo' added as additional property in yaml and providers will not be able to register themselves (schema will fail validation)
   
   Changing additionalProeperties to "true" solves the problem in 'future-compatible' way, however this does not help for 2.0.0 users who won't be able to discover future providers.
   
   Taking into account, however that 2.0.0 has a number of problems and 2.0.1 is going to be really the first 'stable` release I think, it might make sense to add >=2.0.1 limitation in the future providers (I guess that might motivate users of 2.0.0 to upgrade to 2.0.1). We could also yank 2.0.0 release (I think we should do it regardless with the number of small, but annoying problems we have). 
   
   Let me know WDYT.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #13440:
URL: https://github.com/apache/airflow/pull/13440


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754606737


   The problem with this information that it is version dependendent and ti MUST be in the providers folder. This information will change over time, when we add new folders, packages etc. We actually even run pre-commit checks that verify if the code is consistent with this information. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753973188


   Revert is perhaps the wrong word -- but separating the two purposes more clearly -- and not changing the "runtime" json schema.
   
   One option might be to have two YAML documents - one for provider info, one for docs. This could be two files, or it could be two "YAML documents" in the same file.
   
   ```yaml
   ---
   package-name: apache-airflow-providers-apache-cassandra
   name: Apache Cassandra
   description: |
       `Apache Cassandra <http://cassandra.apache.org/>`__.
   
   # ...
   ---
   logo: X
   versions:
     - 1.0.0
   ```
   
   
   (Because `versions` is also not a "runtime" property, but docs only


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753992573


   I very much like the idea of @mik-laj of splitting the schemas rather than files. This should be bullet-proof for both cases and is far less maintenance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754611309


   Which information must live in the providers folder? I'm not talking about removing the provider.yaml, just removing the fields from it that are only used for building the site


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754585936


   Thinking about this more, I'm not sure we even need the "doc build only" information in these files -- we have `docs/apache-airflow-providers-$x/` where we could store this information.
   
   Or it could live in the airflow-site repo. My thinking there is that it is only used for building docs, so having to make extra commits in the apache/airflow monorepo seems un-necessary given airflow-site is what needs that information.
   
   Happy to make these changes myself if you both agree with this approach @mik-laj @potiuk.
   
   In summary:
   
   - Have providers.yaml file only be what is needed at runtime for Airflow
   - Move the information used to build the docs/site to somewhere in to the airflow-site repo (versions, logo, etc)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753990787


   There are many more columns that are not used by Airflow but are only used by documentation. It seems to me that these will be all on this code block.
   https://github.com/apache/airflow/blob/3341d210cd265b51175d8f575e2ee73aa247f28b/airflow/provider.yaml.schema.json#L17-L180
   However, splitting this into several files seems to me to cause maintenance issues, but we can try to define one schema file that will contain all the information and have ``additionalProperties`` field set to `false` and a second specification that will be a subset of the first. This one can be used in runtime and can be forward-compatible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753614997


   @ashb @kaxil @mik-laj -> I've implemented the "removal" of extra properties:
   
   * i remove the extra fields added since 2.0.0 (I am using jsonpath-ng for that - very neat library that implements jsonpath and we can easily specify the path to filter out from Dictionary object (in case of logo simply `integrations..logo` does the trick). 
   * when preparing provider packages I check if provider info modified vlidates with the old schema 2.0 (stored in deprecated_schemas)
   * I added test in CI to install and import all classes from master providers using 2.0.0 airflow rather than master version. This will help us to validate  if providers are still installable with 'old' Airflow 2.0.0. That's actually pretty nice as we will be pretty sure that at least at the import and dependencies level we got things right in master. If we decide to deprecate 2.0.0 we can move it to 2.0.1


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753499611


   As discussed with @mik-laj -> we might also for the time being remove the new fields + fix 2.0.1 and when the time comes we yank (or are confident 2.0.0 is not used) we can stop removing the fields and add >= 2.0.1 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754611309


   Which information must live in the providers folder? I'm not talking about removing the provider.yaml, just removing the fields from it that are only used for building the site (i.e. at least logo, versions.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754677416


   As discussed before - the #13488 with separate runtime schema. I think it is cleanest and best approach. I very much like the idea that all provider info is in one place and then we can split out only the information that is need at runtime. This is way better than splitting off the files and much more logical approach IMHO, especially that we can do the validation and that some information (like versions, package name, description ) are shared between runtime and doc.
   
   I am not sure even why we would want to split them in this case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753990787


   There are many more columns that are not used by Airflow but are only used by documentation. It seems to me that these will be all on this code block.
   https://github.com/apache/airflow/blob/3341d210cd265b51175d8f575e2ee73aa247f28b/airflow/provider.yaml.schema.json#L17-L180
   However, splitting this into several files seems to me to cause maintenance issues, but we can try to define one schema file that will contain all the information and have ``additionalProperties`` field set to `true` and a second specification that will be a subset of the first. This one can be used in runtime and can be extended as needed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753990621


   We could indeed separate those two. I have no problem with that, and I think two files might be enough. Happy to make that change.  And we can rather easily change that even now (I can make the change in couple of days).
   
   But I believe 'additionalProperties' SHOULD be true. If we want to validate 3rd-party customer providers (we want) if we want to maintain forward-compatibility. Otherwise (as in the cases we had) we shut the door to adding new features.
   
   Rather than 'additionalProperties' set to false, I think much better are "required" for the important stuff and making things require when they become indespensible (in which case we make a breaking change and make other fields required). This is very much principle in any kind of protocols like GRPC and others that new fields in the protocol should be pretty much ignored if the other end does not know them, precisely for forward-compatibility reason.
   
   I think this information is static enough that typos in "new" fields are not very likely to happen, and for anyone using IDEs, they will get auto-completion when preparing the files. I would not worry about it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753615705


   We could also add similar mechansm for 'customized_form_field_behaviour' schema., though I do not expect any changes there, and we can always add it if we decide to.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753993017


   > it could be two "YAML documents" in the same file.
   
   This is poorly supported by additional tools like the IDE.  They often validate the entire file against one specification.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754677309


   > As for the versions, deleting them will also be problematic, because it is used to generate the package list
   > http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/apache-airflow-providers/packages-ref.html
   > Deleting this field will mean that we need to figure out a way to get the version information from airflow-site in this repository. Currently, we only sync information from airflow to airflow-site.
   
   We can look at the folders under docs-arcivie -- i.e. do a listdir on https://github.com/apache/airflow-site/tree/master/docs-archive/apache-airflow-providers-apache-hive.
   
   Anyway, I think this is probably easier to look at in code than abstract (and yes, I may just not know enough of what is used where). I'll PR or shut up :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753962424


   >Seems that providers that next wave of providers we are going to release will not be compatible with 2.0.0 because they already have 'logo' added as additional property in yaml and providers will not be able to register themselves (schema will fail validation)
   
   `logo` property is not relevant to runtime code at all -- that is only used for building our docs/website. This change (which i have no problem with in principle) is caused by using one file for two purposes.
   
   I'm not fan of setting additionalProperties to true in general -- it means you can make silly mistakes like typoing a field name and not get any protection.
   
   I'd like to discus reverting this change (or at least the schema deprecation part), as my understanding of this is we are changing the provider yaml schema to support something "outside" of Airflow code.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753990787


   There are many more columns that are not used by Airflow but are only used by documentation. It seems to me that these will be all on this code block.
   https://github.com/apache/airflow/blob/3341d210cd265b51175d8f575e2ee73aa247f28b/airflow/provider.yaml.schema.json#L17-L180
   However, splitting this into several files seems to me to cause maintenance issues, but we can try to define one schema file that will contain all the information and have ``additionalProperties`` field set to `true` and a second specification that will be a subset of the first. This one can be used in runtime and can be forward-compatible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-753614997


   @ashb @kaxil @mik-laj -> I've implemented the "removal" of extra properties:
   
   * i remove the extra fields added since 2.0.0 (I am using jsonpath-ng for that - very neat library that implements jsonpath and we can easily specify the path to filter out from Dictionary object (in case of logo simply `integrations..logo` does the trick). 
   * when preparing provider packages I check if provider info modified matches the old schema 2.0 (stored in deprecated_schemas)
   * I added test in CI to install and import all classes from master providers using 2.0.0 airflow rather than master version. This will help us to validate  if providers are still installable with 'old' Airflow 2.0.0. That's actually pretty nice as we will be pretty sure that at least at the import and dependencies level we got things right in master. If we decide to deprecate 2.0.0 we can move it to 2.0.1


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754015480


   @mik-laj - answering your comment here: 
   
   > This can cause some problems. Although it is a devel dependency, these dependencies appear in the constraints.txt file, which may make installing a new version of this library difficult in the future. Google has a slightly different policy on adding dependencies to their libraries, which seems sensible to apply to our project as well, as in some cases this is how our project should be treated.
   
   I think adding jsonpath-ng is pretty much equivalent of adding jsonschema - they are both part of the same set of standards that complement each other (and they both implement established and popular standards). JsonSchema and JSonPath are different side of the same coin (same as XMLSchema and XMLPath). And they go hand-in-hand. If we installed one, installing the other is no brainer.
   
   As far as old requirements - first of all, as a developer you are not supposed to install 'devel` in a released version, you only should use in airflow code checked out from master. And the way our constraints work that when package gets no upper bound, the constraints will get updated rather quickly after new version of that package gets released and our tests in master pass. Without anyone's involvement. So what you can expect for such dependencies you will have always latest 'good' version in the master constraints.
   
   The 'jsonpath-ng' has no upper bound, so in this cases constraints mechanism will automatically upgrade "master' whenever the 'jsonpath-ng' new version will be released (and all tests pass). Which means that in master constraints we will not get "11 months old" requirements. 
   
   This is solution is really great - we have not only `latest` but also `tested` version of the software (And we have all dependencies tested in sync together). Usually you can get only latest when you use poetry or pip-tools, or single latest version PR when you use dependabot.
   
   What our system provides is better because it figures the "latest" set of constraints that includes all our requirements (both core and providers) and figures out the latest "set of those", Tests them, and only pushes them automatically when all tests passed. There is no other solution I looked at (and believe me I looked at many) that can provide that. Especially that dependabot does not cope well with the situation that we do not want to have fixed requirements but we generate the expected set of constraints via PIP automatically. None of the google libraries has this probl,
   
   What's even more, even if we want to add a patch to a released version, we will use (by using constraints) the version of devel dependency which was OK for that version. We can still upgrade it (and this BTW also happens in v1-10-test/stable branch) so that if we make a patch to an old version, new constraints (with possibly updated version of the dependency) will be updated (but again - only after all tests pass).  And if other packages/providers of our have another limitation here, this is also fine, because we should not have any dependency that community-managed providers should conflict with. It's community responsibility to keep both core and community managed providers working together without conflicts.
   
   > Another example is the `beautifulsoup4` library, which even has version requirements in `setup.py`
   
   The `beautifulsoup4==4.7.1` is only  that old in constraints because somebody put it like that in the devel requirements and it should be fixed. There is no need why devel requirements should have any limitations unless there is  a very good reason for (it and there is a condition in the comment on when we will be able to remove the limitation). I have no idea who and how added it this way, and I hope it can be removed (but I have no idea who and how added it - maybe you can try to fix that separately)?
   
   I think our solution is state of the art and we are solving a lot of problems with dependencies in a better way than many of other applications and librarieres. And the fact that we are both - applications and library, leads to custom, complex solution. But I have not seen any other (including pip-tools, or poetry) that would solve both approaches at the same time. Our solution does and rather reliably now.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13440: Additional properties should be allowed in provider schema

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13440:
URL: https://github.com/apache/airflow/pull/13440#issuecomment-754650734


   > Which information must live in the providers folder? I'm not talking about removing the provider.yaml, just removing the fields from it that are only used for building the site (i.e. at least logo, versions.)
   
   The checks performed start here: https://github.com/apache/airflow/blob/master/scripts/ci/pre_commit/pre_commit_check_provider_yaml_files.py#L151
   
   They are validating a number of things to make sure that when provider.yaml file will be used, the generated documentation is consistent with the code.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org