You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/15 13:37:37 UTC

[GitHub] [airflow] mik-laj opened a new pull request #13696: Improvements for documentation on database setup

mik-laj opened a new pull request #13696:
URL: https://github.com/apache/airflow/pull/13696


   - First of all, the previous version had a title that did not describe its content well.  I corrected it which will make it easier to find this page. 
   - I have added sections that describe the requirements for the supported database versions. Previously, there was only a link to HA requirements, but no description for the basic configuration.
   - I have moved the sections on database URI to the top of this document as this knowledge is needed in the following sections.
   - I have created separate sections for MySQL and PostgreSQL because users who use MySQL are not interested in the information for PostgresSQL and vice versa. This also allowed us to delete various blocks of notes that were difficult to format and read.
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj merged pull request #13696: Improvements for database setup docs

Posted by GitBox <gi...@apache.org>.
mik-laj merged pull request #13696:
URL: https://github.com/apache/airflow/pull/13696


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13696: Improvements for database setup docs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13696:
URL: https://github.com/apache/airflow/pull/13696#issuecomment-761212887


   The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13696: Improvements for database setup docs

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13696:
URL: https://github.com/apache/airflow/pull/13696#discussion_r558359659



##########
File path: docs/apache-airflow/howto/set-up-database.rst
##########
@@ -0,0 +1,147 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Set up a Database Backend
+=========================
+
+Airflow was built to interact with its metadata using `SqlAlchemy <https://docs.sqlalchemy.org/en/13/>`__.
+
+The document below describes the database engine configurations, the necessary changes to their configuration to be used with Airflow, as well as changes to the Airflow configurations to connect to these databases.
+
+Choosing database backend
+-------------------------
+
+If you want to take a real test drive of Airflow, you should consider setting up a database backend to **MySQL** and **PostgresSQL**.
+By default, Airflow uses **SQLite**, which is not intended for development purposes only.
+
+Airflow supports the following database engine versions, so make sure which version you have. Old versions may not support all SQL statements.
+
+  * PostgreSQL:  9.6, 10, 11, 12, 13
+  * MySQL: 5.7, 8
+  * SQLite: 3.15.0+
+
+If you plan on running more than one scheduler, you have to meet additional requirements.
+For details, see :ref:`Scheduler HA Database Requirements <scheduler:ha:db_requirements>`.
+
+Database URI
+------------
+
+Airflow uses SQLAlchemy to connect to the database, which requires you to configure the Database URL.
+You can do this in option ``sql_alchemy_conn`` in section ``[core]``. It is also common to configure
+this option with ``AIRFLOW__CORE__SQL_ALCHEMY_CONN`` environment variable.
+
+.. note::
+    For more information on setting the configuration, see :doc:`/howto/set-config`.
+
+If you want to check the current value, you can use ``airflow config get-value core sql_alchemy_conn`` command as in
+the example below.
+
+.. code-block:: bash
+
+    $ airflow config get-value core sql_alchemy_conn
+    sqlite:////tmp/airflow/airflow.db
+
+The exact format description is described in the SQLAlchemy documentation, see `Database Urls <https://docs.sqlalchemy.org/en/14/core/engines.html>`__. We will also show you some examples below.
+
+Set up a MySQL

Review comment:
       ```suggestion
   Set up MySQL
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13696: Improvements for database setup docs

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13696:
URL: https://github.com/apache/airflow/pull/13696#discussion_r558359659



##########
File path: docs/apache-airflow/howto/set-up-database.rst
##########
@@ -0,0 +1,147 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Set up a Database Backend
+=========================
+
+Airflow was built to interact with its metadata using `SqlAlchemy <https://docs.sqlalchemy.org/en/13/>`__.
+
+The document below describes the database engine configurations, the necessary changes to their configuration to be used with Airflow, as well as changes to the Airflow configurations to connect to these databases.
+
+Choosing database backend
+-------------------------
+
+If you want to take a real test drive of Airflow, you should consider setting up a database backend to **MySQL** and **PostgresSQL**.
+By default, Airflow uses **SQLite**, which is not intended for development purposes only.
+
+Airflow supports the following database engine versions, so make sure which version you have. Old versions may not support all SQL statements.
+
+  * PostgreSQL:  9.6, 10, 11, 12, 13
+  * MySQL: 5.7, 8
+  * SQLite: 3.15.0+
+
+If you plan on running more than one scheduler, you have to meet additional requirements.
+For details, see :ref:`Scheduler HA Database Requirements <scheduler:ha:db_requirements>`.
+
+Database URI
+------------
+
+Airflow uses SQLAlchemy to connect to the database, which requires you to configure the Database URL.
+You can do this in option ``sql_alchemy_conn`` in section ``[core]``. It is also common to configure
+this option with ``AIRFLOW__CORE__SQL_ALCHEMY_CONN`` environment variable.
+
+.. note::
+    For more information on setting the configuration, see :doc:`/howto/set-config`.
+
+If you want to check the current value, you can use ``airflow config get-value core sql_alchemy_conn`` command as in
+the example below.
+
+.. code-block:: bash
+
+    $ airflow config get-value core sql_alchemy_conn
+    sqlite:////tmp/airflow/airflow.db
+
+The exact format description is described in the SQLAlchemy documentation, see `Database Urls <https://docs.sqlalchemy.org/en/14/core/engines.html>`__. We will also show you some examples below.
+
+Set up a MySQL

Review comment:
       ```suggestion
   Set up MySQL
   ```
   
   or
   
   ```suggestion
   Setting up a MySQL Database
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13696: Improvements for database setup docs

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13696:
URL: https://github.com/apache/airflow/pull/13696#discussion_r558360876



##########
File path: docs/apache-airflow/howto/set-up-database.rst
##########
@@ -0,0 +1,147 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Set up a Database Backend
+=========================
+
+Airflow was built to interact with its metadata using `SqlAlchemy <https://docs.sqlalchemy.org/en/13/>`__.
+
+The document below describes the database engine configurations, the necessary changes to their configuration to be used with Airflow, as well as changes to the Airflow configurations to connect to these databases.
+
+Choosing database backend
+-------------------------
+
+If you want to take a real test drive of Airflow, you should consider setting up a database backend to **MySQL** and **PostgresSQL**.
+By default, Airflow uses **SQLite**, which is not intended for development purposes only.
+
+Airflow supports the following database engine versions, so make sure which version you have. Old versions may not support all SQL statements.
+
+  * PostgreSQL:  9.6, 10, 11, 12, 13
+  * MySQL: 5.7, 8
+  * SQLite: 3.15.0+
+
+If you plan on running more than one scheduler, you have to meet additional requirements.
+For details, see :ref:`Scheduler HA Database Requirements <scheduler:ha:db_requirements>`.
+
+Database URI
+------------
+
+Airflow uses SQLAlchemy to connect to the database, which requires you to configure the Database URL.
+You can do this in option ``sql_alchemy_conn`` in section ``[core]``. It is also common to configure
+this option with ``AIRFLOW__CORE__SQL_ALCHEMY_CONN`` environment variable.
+
+.. note::
+    For more information on setting the configuration, see :doc:`/howto/set-config`.
+
+If you want to check the current value, you can use ``airflow config get-value core sql_alchemy_conn`` command as in
+the example below.
+
+.. code-block:: bash
+
+    $ airflow config get-value core sql_alchemy_conn
+    sqlite:////tmp/airflow/airflow.db
+
+The exact format description is described in the SQLAlchemy documentation, see `Database Urls <https://docs.sqlalchemy.org/en/14/core/engines.html>`__. We will also show you some examples below.
+
+Set up a MySQL
+--------------
+
+You need to create a database and a database user that Airflow will use to access this database.
+In the example below, a database ``airflow_db`` and user  with username ``airflow_user`` with password ``airflow_pass`` will be created
+
+.. code-block:: sql
+
+   CREATE DATABASE airflow_db CHARACTER SET utf8 COLLATE utf8_unicode_ci;
+   CREATE USER 'airflow_user' IDENTIFIED BY 'airflow_pass';
+   GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user';
+
+We rely on more strict ANSI SQL settings for MySQL in order to have sane defaults.
+Make sure to have specified ``explicit_defaults_for_timestamp=1`` option under ``[mysqld]`` section
+in your ``my.cnf`` file. You can also activate these options with the ``--explicit-defaults-for-timestamp`` switch passed to ``mysqld`` executable
+
+We recommend using the ``mysqlclient`` driver and specifying it in your SqlAlchemy connection string.
+
+.. code-block:: text
+
+    mysql+mysqldb://<user>:<password>@<host>[:<port>]/<dbname>
+
+But we also support the ``mysql-connector-python`` driver, which lets you connect through SSL
+without any cert options provided.
+
+.. code-block:: text
+
+   mysql+mysqlconnector://<user>:<password>@<host>[:<port>]/<dbname>
+
+However if you want to use other drivers visit the `MySQL Dialect <https://docs.sqlalchemy.org/en/13/dialects/mysql.html>`__  in SQLAlchemy documentation for more information regarding download
+and setup of the SqlAlchemy connection.
+
+Set up a PostgreSQL
+-------------------
+
+You need to create a database and a database user that Airflow will use to access this database.
+In the example below, a database ``airflow_db`` and user  with username ``airflow_user`` with password ``airflow_pass`` will be created
+
+.. code-block:: sql
+
+   CREATE DATABASE airflow_db;
+   CREATE USER airflow_user WITH PASSWORD 'airflow_user';
+   GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user;
+
+You need to create a database and a user that Airflow will use to access this database.

Review comment:
       ```suggestion
   ```
   
   already mentioned before the code-block




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13696: Improvements for database setup docs

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13696:
URL: https://github.com/apache/airflow/pull/13696#discussion_r558360461



##########
File path: docs/apache-airflow/howto/set-up-database.rst
##########
@@ -0,0 +1,147 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+
+Set up a Database Backend
+=========================
+
+Airflow was built to interact with its metadata using `SqlAlchemy <https://docs.sqlalchemy.org/en/13/>`__.
+
+The document below describes the database engine configurations, the necessary changes to their configuration to be used with Airflow, as well as changes to the Airflow configurations to connect to these databases.
+
+Choosing database backend
+-------------------------
+
+If you want to take a real test drive of Airflow, you should consider setting up a database backend to **MySQL** and **PostgresSQL**.
+By default, Airflow uses **SQLite**, which is not intended for development purposes only.
+
+Airflow supports the following database engine versions, so make sure which version you have. Old versions may not support all SQL statements.
+
+  * PostgreSQL:  9.6, 10, 11, 12, 13
+  * MySQL: 5.7, 8
+  * SQLite: 3.15.0+
+
+If you plan on running more than one scheduler, you have to meet additional requirements.
+For details, see :ref:`Scheduler HA Database Requirements <scheduler:ha:db_requirements>`.
+
+Database URI
+------------
+
+Airflow uses SQLAlchemy to connect to the database, which requires you to configure the Database URL.
+You can do this in option ``sql_alchemy_conn`` in section ``[core]``. It is also common to configure
+this option with ``AIRFLOW__CORE__SQL_ALCHEMY_CONN`` environment variable.
+
+.. note::
+    For more information on setting the configuration, see :doc:`/howto/set-config`.
+
+If you want to check the current value, you can use ``airflow config get-value core sql_alchemy_conn`` command as in
+the example below.
+
+.. code-block:: bash
+
+    $ airflow config get-value core sql_alchemy_conn
+    sqlite:////tmp/airflow/airflow.db
+
+The exact format description is described in the SQLAlchemy documentation, see `Database Urls <https://docs.sqlalchemy.org/en/14/core/engines.html>`__. We will also show you some examples below.
+
+Set up a MySQL
+--------------
+
+You need to create a database and a database user that Airflow will use to access this database.
+In the example below, a database ``airflow_db`` and user  with username ``airflow_user`` with password ``airflow_pass`` will be created
+
+.. code-block:: sql
+
+   CREATE DATABASE airflow_db CHARACTER SET utf8 COLLATE utf8_unicode_ci;
+   CREATE USER 'airflow_user' IDENTIFIED BY 'airflow_pass';
+   GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user';
+
+We rely on more strict ANSI SQL settings for MySQL in order to have sane defaults.
+Make sure to have specified ``explicit_defaults_for_timestamp=1`` option under ``[mysqld]`` section
+in your ``my.cnf`` file. You can also activate these options with the ``--explicit-defaults-for-timestamp`` switch passed to ``mysqld`` executable
+
+We recommend using the ``mysqlclient`` driver and specifying it in your SqlAlchemy connection string.
+
+.. code-block:: text
+
+    mysql+mysqldb://<user>:<password>@<host>[:<port>]/<dbname>
+
+But we also support the ``mysql-connector-python`` driver, which lets you connect through SSL
+without any cert options provided.
+
+.. code-block:: text
+
+   mysql+mysqlconnector://<user>:<password>@<host>[:<port>]/<dbname>
+
+However if you want to use other drivers visit the `MySQL Dialect <https://docs.sqlalchemy.org/en/13/dialects/mysql.html>`__  in SQLAlchemy documentation for more information regarding download
+and setup of the SqlAlchemy connection.
+
+Set up a PostgreSQL
+-------------------

Review comment:
       ```suggestion
   Set up a PostgreSQL Database
   -----------------------------
   ```
   
   or 
   
   ```suggestion
   Set up PostgreSQL
   -----------------------------
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #13696: Improvements for database setup docs

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #13696:
URL: https://github.com/apache/airflow/pull/13696#issuecomment-761215413


   @kaxil Updated. See:  https://github.com/apache/airflow/pull/13696/commits/1662aa05940b388a837d96065a15a595f2125768


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org