You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/20 06:53:58 UTC

[GitHub] [airflow] potiuk commented on a change in pull request #10413: Add documentation for preparing database for Airflow

potiuk commented on a change in pull request #10413:
URL: https://github.com/apache/airflow/pull/10413#discussion_r473664358



##########
File path: docs/howto/initialize-database.rst
##########
@@ -48,11 +48,48 @@ SqlAlchemy backend. We recommend using **MySQL** or **Postgres**.
    want to set a default schema for your role with a
    command similar to ``ALTER ROLE username SET search_path = airflow, foobar;``
 
+Setup your database to host Airflow
+-----------------------------------
+
+Create a database called ``airflow`` and a database user that Airflow
+will use to access this database.
+
+Example, for **MySQL**:
+
+.. code-block:: sql
+
+   CREATE DATABASE airflow CHARACTER SET utf8 COLLATE utf8_unicode_ci;
+   CREATE USER 'airflow' IDENTIFIED BY 'airflow';
+   GRANT ALL PRIVILEGES ON airflow.* TO 'airflow';
+
+Example, for **Postgres**:
+
+.. code-block:: sql
+
+   CREATE DATABASE airflow;
+   CREATE USER airflow WITH PASSWORD 'airflow';
+   GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;
+
+You may need to update your Postgres ``pg_hba.conf`` to add the
+``airflow`` user to the database access control list; and to reload
+the database configuration to load your change. See
+`The pg_hba.conf File <https://www.postgresql.org/docs/current/auth-pg-hba-conf.html>`__
+in the Postgres documentation to learn more.
+
+Configure Airflow's database connection string
+----------------------------------------------
+
 Once you've setup your database to host Airflow, you'll need to alter the
-SqlAlchemy connection string located in your configuration file
-``$AIRFLOW_HOME/airflow.cfg``. You should then also change the "executor"
-setting to use "LocalExecutor", an executor that can parallelize task
-instances locally.
+SqlAlchemy connection string located in ``sql_alchemy_conn`` option in ``[core]`` section in your configuration file
+``$AIRFLOW_HOME/airflow.cfg``. 
+
+Configure a worker that supports parallelism
+--------------------------------------------
+
+You should then also change the ``executor`` option in the ``[core]`` option to use ``LocalExecutor``, an executor that can parallelize task instances locally.

Review comment:
       Maybe 
   ```suggestion
   You should then also change the ``executor`` option in the ``[core]`` option to not use ``SequentiaExecutor``. Sequential executor cannot parallelize task instance execution. You need to use `LocalExecutor` if you want to run tasks on single machine or `CeleryExecutor`, `KubernetesExecutor` or `DaskExecutor` for example if you distribute your tasks).
   ```

##########
File path: docs/howto/initialize-database.rst
##########
@@ -48,11 +48,48 @@ SqlAlchemy backend. We recommend using **MySQL** or **Postgres**.
    want to set a default schema for your role with a
    command similar to ``ALTER ROLE username SET search_path = airflow, foobar;``
 
+Setup your database to host Airflow
+-----------------------------------
+
+Create a database called ``airflow`` and a database user that Airflow
+will use to access this database.
+
+Example, for **MySQL**:
+
+.. code-block:: sql
+
+   CREATE DATABASE airflow CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Review comment:
       ```suggestion
      CREATE DATABASE airflow CHARACTER SET utf8_mb3 COLLATE utf8_unicode_ci;
   ```
   Additionally  we must add section that describes how to configure Airflow with mysql `utf8_mb4`. 
   
   Airflow will not work without additional configuration with `utf8_mb4` character set. The index size in few tables is too large for it (maximum index size in 5.7 and 8.0 is 3072 characters and we are exceeding the limit if we use utf8_mb4 - each character in `utf8mb4` is 4 bytes rather than 3 bytes in `utf8_mb3`): https://dev.mysql.com/doc/refman/8.0/en/innodb-limits.html.
   
   This is true for MySQL 5.7 but also remains a problem in MySQL 8, even if we do not officially support it (yet). Additionally: in MySQL 8 `utf8_mb3` is  deprecated and `utf8_mb4` is default when `utf8` is used: https://mysqlserverteam.com/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/
   
   There is a change I implemented that allows using `utf8mb4` character set but it requires special configuration of Airflow: https://github.com/apache/airflow/pull/7570/files
   
   How it works - you need to set `sql_engine_collation_for_ids` parameter in the configuration of Airflow to `utf8_mb3` if the whole database is `utf8_mb4`. This will switch all the long columns that are used in indexes to use `utf8_mb3`, so the index for those columns does not exceed the maximum index size.
   
   Hopefully, in MySQL 9 (or 10?) they will further increase the index size ¯\_(ツ)_/¯




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org