You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/17 09:56:43 UTC

[GitHub] [airflow] xinbinhuang commented on a change in pull request #12677: Refactor SQL/BigQuery/Qubole/Druid Check operators

xinbinhuang commented on a change in pull request #12677:
URL: https://github.com/apache/airflow/pull/12677#discussion_r559123773



##########
File path: airflow/operators/sql.py
##########
@@ -18,27 +18,59 @@
 from distutils.util import strtobool
 from typing import Any, Dict, Iterable, List, Mapping, Optional, SupportsAbs, Union
 
+from cached_property import cached_property
+
 from airflow.exceptions import AirflowException
 from airflow.hooks.base import BaseHook
+from airflow.hooks.dbapi import DbApiHook
 from airflow.models import BaseOperator, SkipMixin
 from airflow.utils.decorators import apply_defaults
 
-ALLOWED_CONN_TYPE = {
-    "google_cloud_platform",
-    "jdbc",
-    "mssql",
-    "mysql",
-    "odbc",
-    "oracle",
-    "postgres",
-    "presto",
-    "snowflake",
-    "sqlite",
-    "vertica",
-}
-
-
-class SQLCheckOperator(BaseOperator):
+
+class BaseSQLOperator(BaseOperator):
+    """
+    This is a base class for generic SQL Operator to get a DB Hook
+
+    The provided method is .get_db_hook(). The default behavior will try to
+    retrieve the DB hook based on connection type.
+    You can custom the behavior by overriding the .get_db_hook() method.
+    """
+
+    @apply_defaults
+    def __init__(self, *, conn_id: Optional[str] = None, database: Optional[str] = None, **kwargs):
+        super().__init__(**kwargs)
+        self.conn_id = conn_id
+        self.database = database
+
+    @cached_property
+    def _hook(self):
+        """Get DB Hook based on connection type"""
+        self.log.debug("Get connection for %s", self.conn_id)
+        conn = BaseHook.get_connection(self.conn_id)
+
+        hook = conn.get_hook()
+        if not isinstance(hook, DbApiHook):
+            raise AirflowException(
+                f'The connection type is not supported by {self.__class__.__name__}. '
+                f'The associated hook should be a subclass of `DbApiHook`. Got {hook.__class__.__name__}'
+            )
+
+        if self.database:
+            hook.schema = self.database

Review comment:
       I totally agree!  But this is how the current mapping of Airflow works. Currently, the `Connection` object uses the field `schema` to specify the database (not schema) for the database type connection, i.e. Postgres, MySQL, etc. And consequently for now, all `DbApiHook`s use the field schema to specify the database to connect to and use `extra` to specify the schema if applicable. Here is a visual mapping:
   
   ```
   Connection(schema) -> DbApiHook (schema) -> DB API (dbname/database) 
   ```
   
   About the field `database`, it's also used across the codebase to means an actual `database`.
   
   Though I think it will be nice to consolidate these concepts, they should be addressed in a separate PR or worth a discussion in the `dev` list. I can send an email in the dev list if you don't object.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org