You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/23 10:15:28 UTC

[GitHub] [airflow] Dr-Denzy opened a new pull request #13281: This commit creates a How-To guide for PostgresOperator.

Dr-Denzy opened a new pull request #13281:
URL: https://github.com/apache/airflow/pull/13281


   It explores how to create tasks that interact with the PostgreSQL database.
   
   This How-To guide demonstrates some of the best practices for using the PostgresOperator class.
   
   closes: #11917
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #13281:
URL: https://github.com/apache/airflow/pull/13281


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r552509538



##########
File path: docs/apache-airflow-providers-postgres/operators/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,243 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.
+ In ``Airflow-2.0``, the ``PostgresOperator`` class resides at ``airflow.providers.postgres.operator.postgres``.
+
+Under the hood, the ``PostgresOperator`` class delegates its heavy lifting to the ``PostgresHook`` class.
+
+Common Database Operations with PostgresOperator
+------------------------------------------------
+
+To use the postgres operator to carry out SQL request, two parameters are required: ``sql`` and ``postgres_conn_id``.
+These two parameters are eventually fed to the postgres hook object that interacts directly with the postgres database.
+
+Creating a Postgres database table
+----------------------------------
+
+The code snippets below are based on Airflow-2.0
+
+.. code-block:: python
+
+    import datetime
+    from airflow import DAG
+    from airflow.providers.postgres.operators.postgres import PostgresOperator
+
+
+    default_args = {
+                    "start_date": datetime.datetime(2020, 2, 2),
+                    "owner": "airflow"
+                    }
+
+
+    with DAG(dag_id="postgres_operator_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag:
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = """
+                                            CREATE TABLE IF NOT EXISTS pet (
+                                                  pet_id SERIAL PRIMARY KEY,
+                                                  name VARCHAR NOT NULL,
+                                                  pet_type VARCHAR NOT NULL,
+                                                  birth_date DATE NOT NULL,
+                                                  OWNER VARCHAR NOT NULL);
+                                                  """
+                                            )

Review comment:
       ```suggestion
       with DAG(dag_id="postgres_operator_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag:
       create_pet_table = PostgresOperator(
           task_id="create_pet_table",
           postgres_conn_id = "postgres_default",
           sql = """
           CREATE TABLE IF NOT EXISTS pet (
                 pet_id SERIAL PRIMARY KEY,
                 name VARCHAR NOT NULL,
                 pet_type VARCHAR NOT NULL,
                 birth_date DATE NOT NULL,
                 OWNER VARCHAR NOT NULL);
                 """
           )
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13281: This commit creates a How-To guide for PostgresOperator.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r547971591



##########
File path: docs/apache-airflow/howto/operator/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,197 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.

Review comment:
       I think we are moving away from 1.10 and we can safely remove the reference to it from here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] AmarEL commented on a change in pull request #13281: This commit creates a How-To guide for PostgresOperator.

Posted by GitBox <gi...@apache.org>.
AmarEL commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r548752923



##########
File path: docs/apache-airflow/howto/operator/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,197 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.
+In ``Airflow-1.10.X``, the ``PostgresOperator`` class is in the ``postgres_operator`` module housed in the ``operators`` package.
+
+However, in ``Airflow-2.0``, the ``PostgresOperator`` class now resides at ``airflow.providers.postgres.operator.postgres``.
+You will get a deprecation warning if you try to import the ``postgres_operator`` module.
+
+Under the hood, the ``PostgresOperator`` class delegates its heavy lifting to the ``PostgresHook`` class.
+
+Common Database Operations with PostgresOperator
+------------------------------------------------
+
+To use the postgres operator to carry out SQL request, two parameters are required: ``sql`` and ``postgres_conn_id``.
+These two parameters are eventually fed to the postgres hook object that interacts directly with the postgres database.
+
+Creating a Postgres database table
+----------------------------------
+
+The code snippets below are based on Airflow-2.0
+
+.. code-block:: python
+
+    import datetime
+
+    from airflow import DAG
+    from airflow.providers.postgres.operator.postgres import PostgresOperator
+
+
+    default_args = {
+                    "start_date": datetime.datetime(2020, 2, 2),
+                    "owner": "airflow"
+                    }
+
+
+    with DAG(dag_id="postgres_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag:
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = """
+                                            CREATE TABLE IF NOT EXISTS pet (
+                                                  pet_id SERIAL PRIMARY KEY,
+                                                  name VARCHAR NOT NULL,
+                                                  pet_type VARCHAR NOT NULL,
+                                                  birth_date DATE NOT NULL,
+                                                  OWNER VARCHAR NOT NULL);
+                                                  """
+                                            )
+
+
+Dumping SQL statements into your PostgresOperator isn't quite appealing and will create maintainability pains somewhere
+down to the road. To prevent this, Airflow offers an elegant solution. This is how it works: you simply create
+a directory inside the DAG folder called ``sql`` and then put all the SQL files containing your SQL queries inside it.
+
+Your ``dags/sql/pet_schema.sql`` should like this:
+
+::
+
+      -- create pet table
+      CREATE TABLE IF NOT EXISTS pet (
+          pet_id SERIAL PRIMARY KEY,
+          name VARCHAR NOT NULL,
+          pet_type VARCHAR NOT NULL,
+          birth_date DATE NOT NULL,
+          OWNER VARCHAR NOT NULL);
+
+
+Now let's refactor ``create_pet_table`` in our DAG:
+
+.. code-block:: python
+
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = "sql/pet_schema.sql"
+                                            )
+
+
+Inserting data into a Postgres database table
+---------------------------------------------
+
+Let's say we already have the SQL insert statement below in our ``dags/sql/pet_schema.sql`` file:
+
+::
+
+  -- populate pet table
+  INSERT INTO pet VALUES ( 'Max', 'Dog', '2018-07-05', 'Jane');
+  INSERT INTO pet VALUES ( 'Susie', 'Cat', '2019-05-01', 'Phil');
+  INSERT INTO pet VALUES ( 'Lester', 'Hamster', '2020-06-23', 'Lily');
+  INSERT INTO pet VALUES ( 'Quincy', 'Parrot', '2013-08-11', 'Anne');
+
+We can then create a PostgresOperator task that populate the ``pet`` table.
+
+.. code-block:: python
+
+  populate_pet_table = PostgresOperator(
+                                        "populate_pet_table",
+                                        postgres_conn_id = "postgres_default",
+                                        sql = "sql/pet_schema.sql"
+                                        )
+
+
+Fetching records from your postgres database table
+--------------------------------------------------
+
+Fetching records from your postgres database table can be as simple as:

Review comment:
       I think that it's important to comment that the result of the SELECT command will not be returned as a xcom variable, for example. The execute method from the PostgresOperator class will just print the result in the log.
   
   I saw this behavior last week trying to pull a SELECT result from a previous OracleOperator's task and just check that the same occurs to PostgresOperator.
   
   https://stackoverflow.com/a/54101400/5499208




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r552510318



##########
File path: docs/apache-airflow-providers-postgres/operators/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,243 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.
+ In ``Airflow-2.0``, the ``PostgresOperator`` class resides at ``airflow.providers.postgres.operator.postgres``.
+
+Under the hood, the ``PostgresOperator`` class delegates its heavy lifting to the ``PostgresHook`` class.
+
+Common Database Operations with PostgresOperator
+------------------------------------------------
+
+To use the postgres operator to carry out SQL request, two parameters are required: ``sql`` and ``postgres_conn_id``.
+These two parameters are eventually fed to the postgres hook object that interacts directly with the postgres database.
+
+Creating a Postgres database table
+----------------------------------
+
+The code snippets below are based on Airflow-2.0
+
+.. code-block:: python
+
+    import datetime
+    from airflow import DAG
+    from airflow.providers.postgres.operators.postgres import PostgresOperator
+
+
+    default_args = {
+                    "start_date": datetime.datetime(2020, 2, 2),
+                    "owner": "airflow"
+                    }
+
+
+    with DAG(dag_id="postgres_operator_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag:
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = """
+                                            CREATE TABLE IF NOT EXISTS pet (
+                                                  pet_id SERIAL PRIMARY KEY,
+                                                  name VARCHAR NOT NULL,
+                                                  pet_type VARCHAR NOT NULL,
+                                                  birth_date DATE NOT NULL,
+                                                  OWNER VARCHAR NOT NULL);
+                                                  """
+                                            )
+
+
+Dumping SQL statements into your PostgresOperator isn't quite appealing and will create maintainability pains somewhere
+down to the road. To prevent this, Airflow offers an elegant solution. This is how it works: you simply create
+a directory inside the DAG folder called ``sql`` and then put all the SQL files containing your SQL queries inside it.
+
+Your ``dags/sql/pet_schema.sql`` should like this:
+
+::
+
+      -- create pet table
+      CREATE TABLE IF NOT EXISTS pet (
+          pet_id SERIAL PRIMARY KEY,
+          name VARCHAR NOT NULL,
+          pet_type VARCHAR NOT NULL,
+          birth_date DATE NOT NULL,
+          OWNER VARCHAR NOT NULL);
+
+
+Now let's refactor ``create_pet_table`` in our DAG:
+
+.. code-block:: python
+
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = "sql/pet_schema.sql"
+                                            )
+
+
+Inserting data into a Postgres database table
+---------------------------------------------
+
+Let's say we already have the SQL insert statement below in our ``dags/sql/pet_schema.sql`` file:
+
+::
+
+  -- populate pet table
+  INSERT INTO pet VALUES ( 'Max', 'Dog', '2018-07-05', 'Jane');
+  INSERT INTO pet VALUES ( 'Susie', 'Cat', '2019-05-01', 'Phil');
+  INSERT INTO pet VALUES ( 'Lester', 'Hamster', '2020-06-23', 'Lily');
+  INSERT INTO pet VALUES ( 'Quincy', 'Parrot', '2013-08-11', 'Anne');
+
+We can then create a PostgresOperator task that populate the ``pet`` table.
+
+.. code-block:: python
+
+  populate_pet_table = PostgresOperator(
+                                        "populate_pet_table",
+                                        postgres_conn_id = "postgres_default",
+                                        sql = "sql/pet_schema.sql"
+                                        )
+
+
+Fetching records from your postgres database table
+--------------------------------------------------
+
+Fetching records from your postgres database table can be as simple as:
+
+.. code-block:: python
+
+  get_all_pets = PostgresOperator(
+                              "get_all_pets",
+                              postgres_conn_id = "postgres_default",
+                              sql = "SELECT * FROM pet;"
+                              )

Review comment:
       ```suggestion
     get_all_pets = PostgresOperator(
         "get_all_pets",
         postgres_conn_id="postgres_default",
         sql="SELECT * FROM pet;"
     )
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r552509986



##########
File path: docs/apache-airflow-providers-postgres/operators/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,243 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.
+ In ``Airflow-2.0``, the ``PostgresOperator`` class resides at ``airflow.providers.postgres.operator.postgres``.
+
+Under the hood, the ``PostgresOperator`` class delegates its heavy lifting to the ``PostgresHook`` class.
+
+Common Database Operations with PostgresOperator
+------------------------------------------------
+
+To use the postgres operator to carry out SQL request, two parameters are required: ``sql`` and ``postgres_conn_id``.
+These two parameters are eventually fed to the postgres hook object that interacts directly with the postgres database.
+
+Creating a Postgres database table
+----------------------------------
+
+The code snippets below are based on Airflow-2.0
+
+.. code-block:: python
+
+    import datetime
+    from airflow import DAG
+    from airflow.providers.postgres.operators.postgres import PostgresOperator
+
+
+    default_args = {
+                    "start_date": datetime.datetime(2020, 2, 2),
+                    "owner": "airflow"
+                    }
+
+
+    with DAG(dag_id="postgres_operator_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag:
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = """
+                                            CREATE TABLE IF NOT EXISTS pet (
+                                                  pet_id SERIAL PRIMARY KEY,
+                                                  name VARCHAR NOT NULL,
+                                                  pet_type VARCHAR NOT NULL,
+                                                  birth_date DATE NOT NULL,
+                                                  OWNER VARCHAR NOT NULL);
+                                                  """
+                                            )
+
+
+Dumping SQL statements into your PostgresOperator isn't quite appealing and will create maintainability pains somewhere
+down to the road. To prevent this, Airflow offers an elegant solution. This is how it works: you simply create
+a directory inside the DAG folder called ``sql`` and then put all the SQL files containing your SQL queries inside it.
+
+Your ``dags/sql/pet_schema.sql`` should like this:
+
+::
+
+      -- create pet table
+      CREATE TABLE IF NOT EXISTS pet (
+          pet_id SERIAL PRIMARY KEY,
+          name VARCHAR NOT NULL,
+          pet_type VARCHAR NOT NULL,
+          birth_date DATE NOT NULL,
+          OWNER VARCHAR NOT NULL);
+
+
+Now let's refactor ``create_pet_table`` in our DAG:
+
+.. code-block:: python
+
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = "sql/pet_schema.sql"
+                                            )
+
+
+Inserting data into a Postgres database table
+---------------------------------------------
+
+Let's say we already have the SQL insert statement below in our ``dags/sql/pet_schema.sql`` file:
+
+::
+
+  -- populate pet table
+  INSERT INTO pet VALUES ( 'Max', 'Dog', '2018-07-05', 'Jane');
+  INSERT INTO pet VALUES ( 'Susie', 'Cat', '2019-05-01', 'Phil');
+  INSERT INTO pet VALUES ( 'Lester', 'Hamster', '2020-06-23', 'Lily');
+  INSERT INTO pet VALUES ( 'Quincy', 'Parrot', '2013-08-11', 'Anne');
+
+We can then create a PostgresOperator task that populate the ``pet`` table.
+
+.. code-block:: python
+
+  populate_pet_table = PostgresOperator(
+                                        "populate_pet_table",
+                                        postgres_conn_id = "postgres_default",
+                                        sql = "sql/pet_schema.sql"
+                                        )

Review comment:
       Needs proper formatting




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r552510572



##########
File path: docs/apache-airflow-providers-postgres/operators/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,243 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.
+ In ``Airflow-2.0``, the ``PostgresOperator`` class resides at ``airflow.providers.postgres.operator.postgres``.
+
+Under the hood, the ``PostgresOperator`` class delegates its heavy lifting to the ``PostgresHook`` class.
+
+Common Database Operations with PostgresOperator
+------------------------------------------------
+
+To use the postgres operator to carry out SQL request, two parameters are required: ``sql`` and ``postgres_conn_id``.
+These two parameters are eventually fed to the postgres hook object that interacts directly with the postgres database.
+
+Creating a Postgres database table
+----------------------------------
+
+The code snippets below are based on Airflow-2.0
+
+.. code-block:: python
+
+    import datetime
+    from airflow import DAG
+    from airflow.providers.postgres.operators.postgres import PostgresOperator
+
+
+    default_args = {
+                    "start_date": datetime.datetime(2020, 2, 2),
+                    "owner": "airflow"
+                    }
+
+
+    with DAG(dag_id="postgres_operator_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag:
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = """
+                                            CREATE TABLE IF NOT EXISTS pet (
+                                                  pet_id SERIAL PRIMARY KEY,
+                                                  name VARCHAR NOT NULL,
+                                                  pet_type VARCHAR NOT NULL,
+                                                  birth_date DATE NOT NULL,
+                                                  OWNER VARCHAR NOT NULL);
+                                                  """
+                                            )
+
+
+Dumping SQL statements into your PostgresOperator isn't quite appealing and will create maintainability pains somewhere
+down to the road. To prevent this, Airflow offers an elegant solution. This is how it works: you simply create
+a directory inside the DAG folder called ``sql`` and then put all the SQL files containing your SQL queries inside it.
+
+Your ``dags/sql/pet_schema.sql`` should like this:
+
+::
+
+      -- create pet table
+      CREATE TABLE IF NOT EXISTS pet (
+          pet_id SERIAL PRIMARY KEY,
+          name VARCHAR NOT NULL,
+          pet_type VARCHAR NOT NULL,
+          birth_date DATE NOT NULL,
+          OWNER VARCHAR NOT NULL);
+
+
+Now let's refactor ``create_pet_table`` in our DAG:
+
+.. code-block:: python
+
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = "sql/pet_schema.sql"
+                                            )
+
+
+Inserting data into a Postgres database table
+---------------------------------------------
+
+Let's say we already have the SQL insert statement below in our ``dags/sql/pet_schema.sql`` file:
+
+::
+
+  -- populate pet table
+  INSERT INTO pet VALUES ( 'Max', 'Dog', '2018-07-05', 'Jane');
+  INSERT INTO pet VALUES ( 'Susie', 'Cat', '2019-05-01', 'Phil');
+  INSERT INTO pet VALUES ( 'Lester', 'Hamster', '2020-06-23', 'Lily');
+  INSERT INTO pet VALUES ( 'Quincy', 'Parrot', '2013-08-11', 'Anne');
+
+We can then create a PostgresOperator task that populate the ``pet`` table.
+
+.. code-block:: python
+
+  populate_pet_table = PostgresOperator(
+                                        "populate_pet_table",
+                                        postgres_conn_id = "postgres_default",
+                                        sql = "sql/pet_schema.sql"
+                                        )
+
+
+Fetching records from your postgres database table
+--------------------------------------------------
+
+Fetching records from your postgres database table can be as simple as:
+
+.. code-block:: python
+
+  get_all_pets = PostgresOperator(
+                              "get_all_pets",
+                              postgres_conn_id = "postgres_default",
+                              sql = "SELECT * FROM pet;"
+                              )
+
+
+
+Passing Parameters into PostgresOperator
+----------------------------------------
+
+PostgresOperator provides ``parameters`` attribute which makes it possible to dynamically inject values into your
+SQL requests during runtime. The BaseOperator class has the ``params`` attribute which is available to the PostgresOperator
+by virtue of inheritance. Both ``parameters`` and ``params`` make it possible to dynamically pass in parameters in many
+interesting ways.
+
+To find the owner of the pet called 'Lester':
+
+.. code-block:: python
+
+  get_birth_date = PostgresOperator(
+                                "get_birth_date",
+                                postgres_conn_id = "postgres_default",
+                                sql = "SELECT * FROM pet WHERE birth_date BETWEEN SYMMETRIC %(begin_date)s AND %(end_date)s",
+                                parameters = {
+                                              'begin_date': '2020-01-01',
+                                              'end_date': '2020-12-31'
+                                              }
+                                )
+
+Now lets refactor our ``get_birth_date`` task. Instead of dumping SQL statements directly into our code, let's tidy things up
+by creating a sql file.
+
+::
+
+  -- dags/sql/birth_date.sql
+  SELECT * FROM pet WHERE birth_date BETWEEN SYMMETRIC {{ params.begin_date }} AND {{ params.end_date }};
+
+And this time we will use the ``params`` attribute which we get for free from the parent ``BaseOperator``
+class.
+
+.. code-block:: python
+
+  get_birth_date = PostgresOperator(
+                                "get_birth_date",
+                                postgres_conn_id = "postgres_default",
+                                sql = "sql/birth_date.sql",
+                                params = {
+                                           'begin_date': '2020-01-01',
+                                            'end_date': '2020-12-31'
+                                          }
+                                )

Review comment:
       Same here

##########
File path: docs/apache-airflow-providers-postgres/operators/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,243 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.
+ In ``Airflow-2.0``, the ``PostgresOperator`` class resides at ``airflow.providers.postgres.operator.postgres``.
+
+Under the hood, the ``PostgresOperator`` class delegates its heavy lifting to the ``PostgresHook`` class.
+
+Common Database Operations with PostgresOperator
+------------------------------------------------
+
+To use the postgres operator to carry out SQL request, two parameters are required: ``sql`` and ``postgres_conn_id``.
+These two parameters are eventually fed to the postgres hook object that interacts directly with the postgres database.
+
+Creating a Postgres database table
+----------------------------------
+
+The code snippets below are based on Airflow-2.0
+
+.. code-block:: python
+
+    import datetime
+    from airflow import DAG
+    from airflow.providers.postgres.operators.postgres import PostgresOperator
+
+
+    default_args = {
+                    "start_date": datetime.datetime(2020, 2, 2),
+                    "owner": "airflow"
+                    }
+
+
+    with DAG(dag_id="postgres_operator_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag:
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = """
+                                            CREATE TABLE IF NOT EXISTS pet (
+                                                  pet_id SERIAL PRIMARY KEY,
+                                                  name VARCHAR NOT NULL,
+                                                  pet_type VARCHAR NOT NULL,
+                                                  birth_date DATE NOT NULL,
+                                                  OWNER VARCHAR NOT NULL);
+                                                  """
+                                            )
+
+
+Dumping SQL statements into your PostgresOperator isn't quite appealing and will create maintainability pains somewhere
+down to the road. To prevent this, Airflow offers an elegant solution. This is how it works: you simply create
+a directory inside the DAG folder called ``sql`` and then put all the SQL files containing your SQL queries inside it.
+
+Your ``dags/sql/pet_schema.sql`` should like this:
+
+::
+
+      -- create pet table
+      CREATE TABLE IF NOT EXISTS pet (
+          pet_id SERIAL PRIMARY KEY,
+          name VARCHAR NOT NULL,
+          pet_type VARCHAR NOT NULL,
+          birth_date DATE NOT NULL,
+          OWNER VARCHAR NOT NULL);
+
+
+Now let's refactor ``create_pet_table`` in our DAG:
+
+.. code-block:: python
+
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = "sql/pet_schema.sql"
+                                            )
+
+
+Inserting data into a Postgres database table
+---------------------------------------------
+
+Let's say we already have the SQL insert statement below in our ``dags/sql/pet_schema.sql`` file:
+
+::
+
+  -- populate pet table
+  INSERT INTO pet VALUES ( 'Max', 'Dog', '2018-07-05', 'Jane');
+  INSERT INTO pet VALUES ( 'Susie', 'Cat', '2019-05-01', 'Phil');
+  INSERT INTO pet VALUES ( 'Lester', 'Hamster', '2020-06-23', 'Lily');
+  INSERT INTO pet VALUES ( 'Quincy', 'Parrot', '2013-08-11', 'Anne');
+
+We can then create a PostgresOperator task that populate the ``pet`` table.
+
+.. code-block:: python
+
+  populate_pet_table = PostgresOperator(
+                                        "populate_pet_table",
+                                        postgres_conn_id = "postgres_default",
+                                        sql = "sql/pet_schema.sql"
+                                        )
+
+
+Fetching records from your postgres database table
+--------------------------------------------------
+
+Fetching records from your postgres database table can be as simple as:
+
+.. code-block:: python
+
+  get_all_pets = PostgresOperator(
+                              "get_all_pets",
+                              postgres_conn_id = "postgres_default",
+                              sql = "SELECT * FROM pet;"
+                              )
+
+
+
+Passing Parameters into PostgresOperator
+----------------------------------------
+
+PostgresOperator provides ``parameters`` attribute which makes it possible to dynamically inject values into your
+SQL requests during runtime. The BaseOperator class has the ``params`` attribute which is available to the PostgresOperator
+by virtue of inheritance. Both ``parameters`` and ``params`` make it possible to dynamically pass in parameters in many
+interesting ways.
+
+To find the owner of the pet called 'Lester':
+
+.. code-block:: python
+
+  get_birth_date = PostgresOperator(
+                                "get_birth_date",
+                                postgres_conn_id = "postgres_default",
+                                sql = "SELECT * FROM pet WHERE birth_date BETWEEN SYMMETRIC %(begin_date)s AND %(end_date)s",
+                                parameters = {
+                                              'begin_date': '2020-01-01',
+                                              'end_date': '2020-12-31'
+                                              }
+                                )
+
+Now lets refactor our ``get_birth_date`` task. Instead of dumping SQL statements directly into our code, let's tidy things up
+by creating a sql file.
+
+::
+
+  -- dags/sql/birth_date.sql
+  SELECT * FROM pet WHERE birth_date BETWEEN SYMMETRIC {{ params.begin_date }} AND {{ params.end_date }};
+
+And this time we will use the ``params`` attribute which we get for free from the parent ``BaseOperator``
+class.
+
+.. code-block:: python
+
+  get_birth_date = PostgresOperator(
+                                "get_birth_date",
+                                postgres_conn_id = "postgres_default",
+                                sql = "sql/birth_date.sql",
+                                params = {
+                                           'begin_date': '2020-01-01',
+                                            'end_date': '2020-12-31'
+                                          }
+                                )
+
+The complete Postgres Operator DAG
+----------------------------------
+
+When we put everything together, our DAG should look like this:
+
+.. code-block:: python
+
+    import datetime
+    from airflow import DAG
+    from airflow.providers.postgres.operators.postgres import PostgresOperator
+
+
+    default_args = {
+                    "start_date": datetime.datetime(2020, 2, 2),
+                    "owner": "airflow"
+                    }

Review comment:
       Same here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r552511322



##########
File path: docs/apache-airflow-providers-postgres/operators/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,243 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.
+ In ``Airflow-2.0``, the ``PostgresOperator`` class resides at ``airflow.providers.postgres.operator.postgres``.
+
+Under the hood, the ``PostgresOperator`` class delegates its heavy lifting to the ``PostgresHook`` class.
+
+Common Database Operations with PostgresOperator
+------------------------------------------------
+
+To use the postgres operator to carry out SQL request, two parameters are required: ``sql`` and ``postgres_conn_id``.
+These two parameters are eventually fed to the postgres hook object that interacts directly with the postgres database.
+
+Creating a Postgres database table
+----------------------------------
+
+The code snippets below are based on Airflow-2.0
+
+.. code-block:: python
+
+    import datetime

Review comment:
       Instead of re-writing the DAG here, you can include the example DAG similar to https://github.com/apache/airflow/blob/master/docs/apache-airflow-providers-http/operators.rst




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r552509905



##########
File path: docs/apache-airflow-providers-postgres/operators/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,243 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.
+ In ``Airflow-2.0``, the ``PostgresOperator`` class resides at ``airflow.providers.postgres.operator.postgres``.
+
+Under the hood, the ``PostgresOperator`` class delegates its heavy lifting to the ``PostgresHook`` class.
+
+Common Database Operations with PostgresOperator
+------------------------------------------------
+
+To use the postgres operator to carry out SQL request, two parameters are required: ``sql`` and ``postgres_conn_id``.
+These two parameters are eventually fed to the postgres hook object that interacts directly with the postgres database.
+
+Creating a Postgres database table
+----------------------------------
+
+The code snippets below are based on Airflow-2.0
+
+.. code-block:: python
+
+    import datetime
+    from airflow import DAG
+    from airflow.providers.postgres.operators.postgres import PostgresOperator
+
+
+    default_args = {
+                    "start_date": datetime.datetime(2020, 2, 2),
+                    "owner": "airflow"
+                    }
+
+
+    with DAG(dag_id="postgres_operator_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag:
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = """
+                                            CREATE TABLE IF NOT EXISTS pet (
+                                                  pet_id SERIAL PRIMARY KEY,
+                                                  name VARCHAR NOT NULL,
+                                                  pet_type VARCHAR NOT NULL,
+                                                  birth_date DATE NOT NULL,
+                                                  OWNER VARCHAR NOT NULL);
+                                                  """
+                                            )
+
+
+Dumping SQL statements into your PostgresOperator isn't quite appealing and will create maintainability pains somewhere
+down to the road. To prevent this, Airflow offers an elegant solution. This is how it works: you simply create
+a directory inside the DAG folder called ``sql`` and then put all the SQL files containing your SQL queries inside it.
+
+Your ``dags/sql/pet_schema.sql`` should like this:
+
+::
+
+      -- create pet table
+      CREATE TABLE IF NOT EXISTS pet (
+          pet_id SERIAL PRIMARY KEY,
+          name VARCHAR NOT NULL,
+          pet_type VARCHAR NOT NULL,
+          birth_date DATE NOT NULL,
+          OWNER VARCHAR NOT NULL);
+
+
+Now let's refactor ``create_pet_table`` in our DAG:
+
+.. code-block:: python
+
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = "sql/pet_schema.sql"
+                                            )

Review comment:
       This needs proper formatting too




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r552511322



##########
File path: docs/apache-airflow-providers-postgres/operators/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,243 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.
+ In ``Airflow-2.0``, the ``PostgresOperator`` class resides at ``airflow.providers.postgres.operator.postgres``.
+
+Under the hood, the ``PostgresOperator`` class delegates its heavy lifting to the ``PostgresHook`` class.
+
+Common Database Operations with PostgresOperator
+------------------------------------------------
+
+To use the postgres operator to carry out SQL request, two parameters are required: ``sql`` and ``postgres_conn_id``.
+These two parameters are eventually fed to the postgres hook object that interacts directly with the postgres database.
+
+Creating a Postgres database table
+----------------------------------
+
+The code snippets below are based on Airflow-2.0
+
+.. code-block:: python
+
+    import datetime

Review comment:
       Instead of re-writing the DAG here, you can include the example DAG similar to https://github.com/apache/airflow/blob/master/docs/apache-airflow-providers-http/operators.rst (it uses https://github.com/apache/airflow/blob/master/airflow/providers/http/example_dags/example_http.py)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13281: This commit creates a How-To guide for PostgresOperator.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r547971938



##########
File path: docs/apache-airflow/howto/operator/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,197 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.

Review comment:
       This is forward-looking. In a few months we hope 1.10 will be more of an afterthought :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#issuecomment-754214268


   [The Workflow run](https://github.com/apache/airflow/actions/runs/461932592) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Dr-Denzy commented on a change in pull request #13281: This commit creates a How-To guide for PostgresOperator.

Posted by GitBox <gi...@apache.org>.
Dr-Denzy commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r547981185



##########
File path: docs/apache-airflow/howto/operator/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,197 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator

Review comment:
       Ok Noted. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r552510511



##########
File path: docs/apache-airflow-providers-postgres/operators/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,243 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.
+ In ``Airflow-2.0``, the ``PostgresOperator`` class resides at ``airflow.providers.postgres.operator.postgres``.
+
+Under the hood, the ``PostgresOperator`` class delegates its heavy lifting to the ``PostgresHook`` class.
+
+Common Database Operations with PostgresOperator
+------------------------------------------------
+
+To use the postgres operator to carry out SQL request, two parameters are required: ``sql`` and ``postgres_conn_id``.
+These two parameters are eventually fed to the postgres hook object that interacts directly with the postgres database.
+
+Creating a Postgres database table
+----------------------------------
+
+The code snippets below are based on Airflow-2.0
+
+.. code-block:: python
+
+    import datetime
+    from airflow import DAG
+    from airflow.providers.postgres.operators.postgres import PostgresOperator
+
+
+    default_args = {
+                    "start_date": datetime.datetime(2020, 2, 2),
+                    "owner": "airflow"
+                    }
+
+
+    with DAG(dag_id="postgres_operator_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag:
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = """
+                                            CREATE TABLE IF NOT EXISTS pet (
+                                                  pet_id SERIAL PRIMARY KEY,
+                                                  name VARCHAR NOT NULL,
+                                                  pet_type VARCHAR NOT NULL,
+                                                  birth_date DATE NOT NULL,
+                                                  OWNER VARCHAR NOT NULL);
+                                                  """
+                                            )
+
+
+Dumping SQL statements into your PostgresOperator isn't quite appealing and will create maintainability pains somewhere
+down to the road. To prevent this, Airflow offers an elegant solution. This is how it works: you simply create
+a directory inside the DAG folder called ``sql`` and then put all the SQL files containing your SQL queries inside it.
+
+Your ``dags/sql/pet_schema.sql`` should like this:
+
+::
+
+      -- create pet table
+      CREATE TABLE IF NOT EXISTS pet (
+          pet_id SERIAL PRIMARY KEY,
+          name VARCHAR NOT NULL,
+          pet_type VARCHAR NOT NULL,
+          birth_date DATE NOT NULL,
+          OWNER VARCHAR NOT NULL);
+
+
+Now let's refactor ``create_pet_table`` in our DAG:
+
+.. code-block:: python
+
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = "sql/pet_schema.sql"
+                                            )
+
+
+Inserting data into a Postgres database table
+---------------------------------------------
+
+Let's say we already have the SQL insert statement below in our ``dags/sql/pet_schema.sql`` file:
+
+::
+
+  -- populate pet table
+  INSERT INTO pet VALUES ( 'Max', 'Dog', '2018-07-05', 'Jane');
+  INSERT INTO pet VALUES ( 'Susie', 'Cat', '2019-05-01', 'Phil');
+  INSERT INTO pet VALUES ( 'Lester', 'Hamster', '2020-06-23', 'Lily');
+  INSERT INTO pet VALUES ( 'Quincy', 'Parrot', '2013-08-11', 'Anne');
+
+We can then create a PostgresOperator task that populate the ``pet`` table.
+
+.. code-block:: python
+
+  populate_pet_table = PostgresOperator(
+                                        "populate_pet_table",
+                                        postgres_conn_id = "postgres_default",
+                                        sql = "sql/pet_schema.sql"
+                                        )
+
+
+Fetching records from your postgres database table
+--------------------------------------------------
+
+Fetching records from your postgres database table can be as simple as:
+
+.. code-block:: python
+
+  get_all_pets = PostgresOperator(
+                              "get_all_pets",
+                              postgres_conn_id = "postgres_default",
+                              sql = "SELECT * FROM pet;"
+                              )
+
+
+
+Passing Parameters into PostgresOperator
+----------------------------------------
+
+PostgresOperator provides ``parameters`` attribute which makes it possible to dynamically inject values into your
+SQL requests during runtime. The BaseOperator class has the ``params`` attribute which is available to the PostgresOperator
+by virtue of inheritance. Both ``parameters`` and ``params`` make it possible to dynamically pass in parameters in many
+interesting ways.
+
+To find the owner of the pet called 'Lester':
+
+.. code-block:: python
+
+  get_birth_date = PostgresOperator(
+                                "get_birth_date",
+                                postgres_conn_id = "postgres_default",
+                                sql = "SELECT * FROM pet WHERE birth_date BETWEEN SYMMETRIC %(begin_date)s AND %(end_date)s",
+                                parameters = {
+                                              'begin_date': '2020-01-01',
+                                              'end_date': '2020-12-31'
+                                              }
+                                )

Review comment:
       Needs proper formatting




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Dr-Denzy commented on a change in pull request #13281: This commit creates a How-To guide for PostgresOperator.

Posted by GitBox <gi...@apache.org>.
Dr-Denzy commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r547980508



##########
File path: docs/apache-airflow/howto/operator/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,197 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.

Review comment:
       Oh yeah. I agree.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r552509538



##########
File path: docs/apache-airflow-providers-postgres/operators/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,243 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator
+=================================
+
+Introduction
+------------
+
+Apache Airflow has a robust trove of operators that can be used to implement the various tasks that make up your
+workflow. Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges).
+
+A task defined or implemented by a operator is a unit of work in your data pipeline.
+
+The purpose of Postgres Operator is to define tasks involving interactions with the PostgreSQL database.
+ In ``Airflow-2.0``, the ``PostgresOperator`` class resides at ``airflow.providers.postgres.operator.postgres``.
+
+Under the hood, the ``PostgresOperator`` class delegates its heavy lifting to the ``PostgresHook`` class.
+
+Common Database Operations with PostgresOperator
+------------------------------------------------
+
+To use the postgres operator to carry out SQL request, two parameters are required: ``sql`` and ``postgres_conn_id``.
+These two parameters are eventually fed to the postgres hook object that interacts directly with the postgres database.
+
+Creating a Postgres database table
+----------------------------------
+
+The code snippets below are based on Airflow-2.0
+
+.. code-block:: python
+
+    import datetime
+    from airflow import DAG
+    from airflow.providers.postgres.operators.postgres import PostgresOperator
+
+
+    default_args = {
+                    "start_date": datetime.datetime(2020, 2, 2),
+                    "owner": "airflow"
+                    }
+
+
+    with DAG(dag_id="postgres_operator_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag:
+        create_pet_table = PostgresOperator(
+                                            "create_pet_table",
+                                            postgres_conn_id = "postgres_default",
+                                            sql = """
+                                            CREATE TABLE IF NOT EXISTS pet (
+                                                  pet_id SERIAL PRIMARY KEY,
+                                                  name VARCHAR NOT NULL,
+                                                  pet_type VARCHAR NOT NULL,
+                                                  birth_date DATE NOT NULL,
+                                                  OWNER VARCHAR NOT NULL);
+                                                  """
+                                            )

Review comment:
       ```suggestion
       with DAG(dag_id="postgres_operator_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag:
       create_pet_table = PostgresOperator(
           task_id="create_pet_table",
           postgres_conn_id = "postgres_default",
           sql = """
           CREATE TABLE IF NOT EXISTS pet (
                 pet_id SERIAL PRIMARY KEY,
                 name VARCHAR NOT NULL,
                 pet_type VARCHAR NOT NULL,
                 birth_date DATE NOT NULL,
                 OWNER VARCHAR NOT NULL);
                 """
       )
           )
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Dr-Denzy commented on pull request #13281: Add How-To guide for PostgresOperator

Posted by GitBox <gi...@apache.org>.
Dr-Denzy commented on pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#issuecomment-755319141


   Thanks @kaxil for the well thought out suggestions. I have made the necessary changes in line with your suggestions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13281: This commit creates a How-To guide for PostgresOperator.

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13281:
URL: https://github.com/apache/airflow/pull/13281#discussion_r547971876



##########
File path: docs/apache-airflow/howto/operator/postgres_operator_howto_guide.rst
##########
@@ -0,0 +1,197 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+How-to Guide for PostgresOperator

Review comment:
       This guide should be moved to `docs/apache-airflow-providers-postgres/operators` similar to https://github.com/apache/airflow/blob/master/docs/apache-airflow-providers-google/operators/ads.rst




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org