You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/17 14:17:01 UTC

[GitHub] [airflow] kaxil edited a comment on issue #12983: Add support for the new selective docs building in CI build-docs step

kaxil edited a comment on issue #12983:
URL: https://github.com/apache/airflow/issues/12983#issuecomment-761819429


   >All such cross references are already automatically checked (via pre-commit) and stored in https://github.com/apache/airflow/blob/master/airflow/providers/dependencies.json .
   
   This is not the cross-reference we (at least me) are talking about. Google Providers for example reference xcom documentation from apache-airflow (core) docs. Example: https://github.com/apache/airflow/blob/master/docs/apache-airflow-providers-google/operators/marketing_platform/campaign_manager.rst
   
   And similarly apache-airflow (core) docs reference `~airflow.providers.http.operators.http.SimpleHttpOperator` and `~airflow.providers.sqlite.operators.sqlite.SqliteOperator`, `~airflow.providers.jdbc.hooks.jdbc.JdbcHook`
   
   So at the very least, we need to build docs:
   
   - `apache-airflow`
   - `apache-airflow-providers`
   - `apache-airflow-providers-PROVIDER_THAT_CHANGES`
   
   
   >Now I want us to consider what makes more sense:
   > 1. Optimizing #13706 - doc/*.rst documentation-only change
   > 2. Optimizing docs build per-provider - only one provider code+documentation changes
   
   (2) is not possible because of the reason I explained above. We need (3)  Optimizing docs build **core** + **apache-airflow-providers** + per-provider - only one provider code+documentation changes + **core** docs + **apache-airflow-providers**
   
   
   >I'd argue changing only documentation is a bad smell. Documentation should usually be changed together with the code when code change happens. So I argue that those builds will be extremely rare in the future. And they will mostly not impact the people who are making 'substantial' changes - those who need fast feedback on their builds.
   
   Wrong, we encourage doc only changes for new contributors, where they can add a missing section, fix formatting issues, correct outdated information, definitely not a bad smell. Documentation has to be a first-class citizen and important for us as a project, more so because we are an OSS project. The history of doc only changes (based on a number of commits) is also large. And at least documentation will be always evolving (getting better) -- not only now because we released a major version.
   
   >We do not have to build "airflow" docs in this case - there should be "0" references from airflow to particular providers docs.
   I don't think so, it is completely find for just to reference for example a Slack provider to explain sla_miss_callback functionality or Secrets Backend.
   
   Example references:
   ```
   best-practices.rst:207:Similarly, if you have a task that starts a microservice in Kubernetes or Mesos, you should check if the service has started or not using :class:`airflow.providers.http.sensors.http.HttpSensor`.
   concepts.rst:434:by pre-installing some :doc:`apache-airflow-providers:index` packages (they are always available no
   concepts.rst:437:- :class:`~airflow.providers.http.operators.http.SimpleHttpOperator` - sends an HTTP request
   concepts.rst:438:- :class:`~airflow.providers.sqlite.operators.sqlite.SqliteOperator` - SQLite DB operator
   concepts.rst:443:additional packages manually (for example ``apache-airflow-providers-mysql`` package).
   concepts.rst:447:- :class:`~airflow.providers.mysql.operators.mysql.MySqlOperator`
   concepts.rst:448:- :class:`~airflow.providers.postgres.operators.postgres.PostgresOperator`
   concepts.rst:449:- :class:`~airflow.providers.microsoft.mssql.operators.mssql.MsSqlOperator`
   concepts.rst:450:- :class:`~airflow.providers.oracle.operators.oracle.OracleOperator`
   concepts.rst:451:- :class:`~airflow.providers.jdbc.operators.jdbc.JdbcOperator`
   concepts.rst:452:- :class:`~airflow.providers.docker.operators.docker.DockerOperator`
   concepts.rst:453:- :class:`~airflow.providers.apache.hive.operators.hive.HiveOperator`
   concepts.rst:454:- :class:`~airflow.providers.amazon.aws.operators.s3_file_transform.S3FileTransformOperator`
   concepts.rst:455:- :class:`~airflow.providers.mysql.transfers.presto_to_mysql.PrestoToMySqlOperator`,
   concepts.rst:456:- :class:`~airflow.providers.slack.operators.slack.SlackAPIOperator`
   concepts.rst:459:at :doc:`apache-airflow-providers:index`.
   concepts.rst:795:``conn_id`` for the :class:`~airflow.providers.postgres.hooks.postgres.PostgresHook` is
   howto/connection.rst:332:can also add a custom provider that adds custom connection types. See :doc:`apache-airflow-providers:index`
   howto/connection.rst:339::py:class:`~airflow.providers.jdbc.hooks.jdbc.JdbcHook`.
   howto/connection.rst:350:You can read more about details how to add custom provider packages in the :doc:`apache-airflow-providers:index`
   howto/custom-operator.rst:101:See :doc:`connection` for how to create and manage connections and :doc:`apache-airflow-providers:index` for
   howto/custom-operator.rst:268:is :class:`airflow.providers.google.cloud.sensors.gcs.GCSUploadSessionCompleteSensor`.
   howto/define_extra_link.rst:66:all the operators through an airflow plugin or through airflow providers. You can learn more about it in the
   howto/define_extra_link.rst:67::ref:`plugin example <plugin-example>` and in :doc:`apache-airflow-providers:index`.
   howto/define_extra_link.rst:77:tasks using :class:`~airflow.providers.amazon.aws.transfers.gcs_to_s3.GCSToS3Operator` operator.
   howto/define_extra_link.rst:86:  from airflow.providers.amazon.aws.transfers.gcs_to_s3 import GCSToS3Operator
   howto/define_extra_link.rst:112::class:`~airflow.providers.google.cloud.operators.bigquery.BigQueryExecuteQueryOperator` includes a link to the Google Cloud
   howto/define_extra_link.rst:119:    from airflow.providers.google.cloud.operators.bigquery import BigQueryOperator
   howto/define_extra_link.rst:145:As explained in :doc:`apache-airflow-providers:index`, when you create your own Airflow Provider, you can
   howto/define_extra_link.rst:150:by ``apache-airflow-providers-google`` provider currently:
   howto/define_extra_link.rst:155:      - airflow.providers.google.cloud.operators.bigquery.BigQueryConsoleLink
   howto/define_extra_link.rst:156:      - airflow.providers.google.cloud.operators.bigquery.BigQueryConsoleIndexableLink
   howto/define_extra_link.rst:157:      - airflow.providers.google.cloud.operators.mlengine.AIPlatformConsoleLink
   howto/email-config.rst:76:      email_backend = airflow.providers.sendgrid.utils.emailer.send_email
   installation.rst:88:has a corresponding ``apache-airflow-providers-amazon`` providers package to be installed. When you install
   installation.rst:106:see: :doc:`apache-airflow-providers:index`
   installation.rst:108:For the list of the provider packages and what they enable, see: :doc:`apache-airflow-providers:packages-ref`.
   modules_management.rst:272:    apache-airflow-providers-amazon           | 1.0.0b2
   modules_management.rst:273:    apache-airflow-providers-apache-cassandra | 1.0.0b2
   modules_management.rst:274:    apache-airflow-providers-apache-druid     | 1.0.0b2
   modules_management.rst:275:    apache-airflow-providers-apache-hdfs      | 1.0.0b2
   modules_management.rst:276:    apache-airflow-providers-apache-hive      | 1.0.0b2
   operators-and-hooks-ref.rst:23::doc:`apache-airflow-providers:operators-and-hooks-ref/index`.
   plugins.rst:169:    from airflow.providers.amazon.aws.transfers.gcs_to_s3 import GCSToS3Operator
   production-deployment.rst:762:Some operators, such as :class:`airflow.providers.google.cloud.operators.kubernetes_engine.GKEStartPodOperator`,
   production-deployment.rst:763::class:`airflow.providers.google.cloud.operators.dataflow.DataflowStartSqlJobOperator`, require
   production-deployment.rst:869:If you want to establish an SSH connection to the Compute Engine instance, you must have the network address of this instance and credentials to access it. To simplify this task, you can use :class:`~airflow.providers.google.cloud.hooks.compute.ComputeEngineHook` instead of :class:`~airflow.providers.ssh.hooks.ssh.SSHHook`
   production-deployment.rst:871:The :class:`~airflow.providers.google.cloud.hooks.compute.ComputeEngineHook` support authorization with Google OS Login service. It is an extremely robust way to manage Linux access properly as it stores short-lived ssh keys in the metadata service, offers PAM modules for access and sudo privilege checking and offers nsswitch user lookup into the metadata service as well.
   security/secrets/secrets-backend/index.rst:69:    airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
   upgrading-to-2.rst:117:    from airflow.providers.docker.operators.docker import DockerOperator
   upgrading-to-2.rst:126:automatically installs the ``apache-airflow-providers-docker`` package.
   upgrading-to-2.rst:131:You can read more about providers at :doc:`apache-airflow-providers:index`.
   upgrading-to-2.rst:507:  * You can read more about providers at :doc:`apache-airflow-providers:index`.
   ```
   
   >I firmly believe we should optimize those things that have higher impact and bring more benefits. I hate doing micro-optimisations, I always think "long-term"/"high impact" when I am doing it.
   
   We should not fall in the trap of over-optimisation though. We can still get a good reduction in time (that will result in more benefits and hopefully higher impact) as we won't be building docs for all providers (only `apache-airflow`, `apache-airflow-providers` and `apache-airflow-provder-PROVIDER_THAT_CHANGED`.) 
   
   >WDYT @kaxil @mik-laj which of those are worth it? (1) or (2)? I argue (above) that (2) has much higher impact and brings more benefits to the community as whole. If you think we should do (1) rather than (2), I'd love to hear your line of thoughts an reasoning why you think it is better to do it.
   
   As I said above, I would vouch for (1) and (3) not (2) or just (3) where it will supersede (1) too.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org