You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/07 22:27:58 UTC

[GitHub] [airflow] kazanzhy opened a new pull request, #26944: Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first

kazanzhy opened a new pull request, #26944:
URL: https://github.com/apache/airflow/pull/26944

   Currently `DbApiHook.get_records` and `DbApiHook.get_first` invokes `cursor.execute method.
   This PR is a step to possibly use a single method `DbApiHook.run` for query execution.
   
   closes: #9957
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #26944:
URL: https://github.com/apache/airflow/pull/26944#issuecomment-1843655260

   Not suure - might be problem with the async operators 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #26944: Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #26944:
URL: https://github.com/apache/airflow/pull/26944


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #26944: Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #26944:
URL: https://github.com/apache/airflow/pull/26944#discussion_r1006269450


##########
airflow/providers/common/sql/hooks/sql.py:
##########
@@ -175,41 +207,26 @@ def get_pandas_df_by_chunks(self, sql, parameters=None, *, chunksize, **kwargs):
             yield from psql.read_sql(sql, con=conn, params=parameters, chunksize=chunksize, **kwargs)
 
     def get_records(
-        self,
-        sql: str | list[str],
-        parameters: Iterable | Mapping | None = None,
-        **kwargs: dict,
-    ):
+        self, sql: str | list[str], parameters: Iterable | Mapping | None = None
+    ) -> Any | list[Any]:
         """
         Executes the sql and returns a set of records.
 
-        :param sql: the sql statement to be executed (str) or a list of
-            sql statements to execute
+        :param sql: the sql statement to be executed (str) or a list of sql statements to execute
         :param parameters: The parameters to render the SQL query with.
         """
-        with closing(self.get_conn()) as conn:
-            with closing(conn.cursor()) as cur:
-                if parameters is not None:
-                    cur.execute(sql, parameters)
-                else:
-                    cur.execute(sql)
-                return cur.fetchall()
+        return self.run(sql=sql, parameters=parameters, handler=fetch_all_handler)
 
-    def get_first(self, sql: str | list[str], parameters=None):
+    def get_first(
+        self, sql: str | list[str], parameters: Iterable | Mapping | None = None
+    ) -> Any | list[Any]:

Review Comment:
   Same here I think `Any` should do.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #26944: Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #26944:
URL: https://github.com/apache/airflow/pull/26944#discussion_r1006268023


##########
airflow/providers/common/sql/hooks/sql.py:
##########
@@ -38,6 +41,35 @@ def fetch_all_handler(cursor) -> list[tuple] | None:
         return None
 
 
+def fetch_one_handler(cursor) -> list[tuple] | None:
+    """Handler for DbApiHook.run() to return results"""
+    if cursor.description is not None:
+        return cursor.fetchone()
+    else:
+        return None
+
+
+def _backported_get_hook(connection, *, hook_params=None):
+    """Return hook based on conn_type
+    For supporting Airflow versions < 2.3, we backport "get_hook()" method. This should be removed
+    when "apache-airflow-providers-slack" will depend on Airflow >= 2.3.

Review Comment:
   Agree we cam remove it (and add apache-airflow >= 2.3 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #26944: Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #26944:
URL: https://github.com/apache/airflow/pull/26944#discussion_r1005278875


##########
airflow/providers/common/sql/hooks/sql.py:
##########
@@ -38,6 +41,35 @@ def fetch_all_handler(cursor) -> list[tuple] | None:
         return None
 
 
+def fetch_one_handler(cursor) -> list[tuple] | None:
+    """Handler for DbApiHook.run() to return results"""
+    if cursor.description is not None:
+        return cursor.fetchone()
+    else:
+        return None
+
+
+def _backported_get_hook(connection, *, hook_params=None):
+    """Return hook based on conn_type
+    For supporting Airflow versions < 2.3, we backport "get_hook()" method. This should be removed
+    when "apache-airflow-providers-slack" will depend on Airflow >= 2.3.

Review Comment:
   FYI @potiuk not sure why `common.sql` doesn't have min provider version in the `provider.yaml`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #26944: Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #26944:
URL: https://github.com/apache/airflow/pull/26944#discussion_r1006268609


##########
airflow/providers/common/sql/hooks/sql.py:
##########
@@ -175,41 +207,26 @@ def get_pandas_df_by_chunks(self, sql, parameters=None, *, chunksize, **kwargs):
             yield from psql.read_sql(sql, con=conn, params=parameters, chunksize=chunksize, **kwargs)
 
     def get_records(
-        self,
-        sql: str | list[str],
-        parameters: Iterable | Mapping | None = None,
-        **kwargs: dict,
-    ):
+        self, sql: str | list[str], parameters: Iterable | Mapping | None = None
+    ) -> Any | list[Any]:

Review Comment:
   Isn't `Any | list[Any] == Any`  ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kazanzhy commented on a diff in pull request #26944: Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on code in PR #26944:
URL: https://github.com/apache/airflow/pull/26944#discussion_r1007433847


##########
airflow/providers/common/sql/hooks/sql.py:
##########
@@ -175,41 +207,26 @@ def get_pandas_df_by_chunks(self, sql, parameters=None, *, chunksize, **kwargs):
             yield from psql.read_sql(sql, con=conn, params=parameters, chunksize=chunksize, **kwargs)
 
     def get_records(
-        self,
-        sql: str | list[str],
-        parameters: Iterable | Mapping | None = None,
-        **kwargs: dict,
-    ):
+        self, sql: str | list[str], parameters: Iterable | Mapping | None = None
+    ) -> Any | list[Any]:

Review Comment:
   I changed back from `sql: str = ""` to `sql: str | list[str] = ""`.
   It seems strange but without it, I can't remove `# type: ignore[override]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #26944:
URL: https://github.com/apache/airflow/pull/26944#issuecomment-1843641268

   > back to tuple. - what a mess:-)
   
   No. It is actually common right now. We use to have mess in the PAST . Now it's all straightforward. And the "other" options (like Dictionary) are only for those who want to have backwards compatibility. Mess is mainly because user want to stick to old ways even if we make things better organized and prefer to stay with the messy way it was before. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #26944: Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #26944:
URL: https://github.com/apache/airflow/pull/26944#discussion_r1006269110


##########
airflow/providers/common/sql/hooks/sql.py:
##########
@@ -175,41 +207,26 @@ def get_pandas_df_by_chunks(self, sql, parameters=None, *, chunksize, **kwargs):
             yield from psql.read_sql(sql, con=conn, params=parameters, chunksize=chunksize, **kwargs)
 
     def get_records(
-        self,
-        sql: str | list[str],
-        parameters: Iterable | Mapping | None = None,
-        **kwargs: dict,
-    ):
+        self, sql: str | list[str], parameters: Iterable | Mapping | None = None
+    ) -> Any | list[Any]:

Review Comment:
   I think that should also remove the # type: ignore[override] in subcleases? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first [airflow]

Posted by "shlomiken (via GitHub)" <gi...@apache.org>.
shlomiken commented on PR #26944:
URL: https://github.com/apache/airflow/pull/26944#issuecomment-1843650816

   @potiuk - i think i came out wrong - i really appreciate what have being done (sticking with standard DBAPI) 
   the thing is meant is - we used to have tuple, then upgraded and got dict , now we  understand tuple is the desired. 
   
   anyway - i'm trying to use the latest and greatest , in addition i'm using astronomer async operators - but now
   i get this weird error which i cant understand how its possible .
   ```
   super().__init__(conn_id=snowflake_conn_id, **kwargs)
   E       TypeError: airflow.providers.common.sql.operators.sql.SQLExecuteQueryOperator.__init__() got multiple values for keyword argument 'conn_id'
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #26944:
URL: https://github.com/apache/airflow/pull/26944#issuecomment-1843644984

   So if you want to avoid the mess - convert your code to use tuples and you will be good and everything will be straightend u for you (as is for users who are starting now and everything is the same for all DBAPi Hooks)
   
   Also if you want to know why - you can read more of the architectural decisions made here: https://github.com/apache/airflow/pull/36015 
   
   They explain how we straightened up the mess.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #26944: Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #26944:
URL: https://github.com/apache/airflow/pull/26944#discussion_r1005272207


##########
airflow/providers/common/sql/hooks/sql.py:
##########
@@ -38,6 +41,35 @@ def fetch_all_handler(cursor) -> list[tuple] | None:
         return None
 
 
+def fetch_one_handler(cursor) -> list[tuple] | None:
+    """Handler for DbApiHook.run() to return results"""
+    if cursor.description is not None:
+        return cursor.fetchone()
+    else:
+        return None
+
+
+def _backported_get_hook(connection, *, hook_params=None):
+    """Return hook based on conn_type
+    For supporting Airflow versions < 2.3, we backport "get_hook()" method. This should be removed
+    when "apache-airflow-providers-slack" will depend on Airflow >= 2.3.

Review Comment:
   no need for this any more... the next release of provider has minimum version of 2.3 for all providers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #26944:
URL: https://github.com/apache/airflow/pull/26944#issuecomment-1842873116

   And make sure to fix any deprecation warnings you see and look at the release notes - of Snowflake provider. There were a few breaking changes that you might want to fix (some of the things could be one of them).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #26944:
URL: https://github.com/apache/airflow/pull/26944#issuecomment-1842870944

   Use latest version of both Airflow and Snowflake provider. There were several fixes implemented after that one.
   
   Just make sure to upgrade everything to latest versions  as they contain latest fixes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #26944: Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #26944:
URL: https://github.com/apache/airflow/pull/26944#discussion_r1006267713


##########
airflow/providers/common/sql/hooks/sql.py:
##########
@@ -38,6 +41,35 @@ def fetch_all_handler(cursor) -> list[tuple] | None:
         return None
 
 
+def fetch_one_handler(cursor) -> list[tuple] | None:
+    """Handler for DbApiHook.run() to return results"""
+    if cursor.description is not None:
+        return cursor.fetchone()
+    else:
+        return None
+
+
+def _backported_get_hook(connection, *, hook_params=None):
+    """Return hook based on conn_type
+    For supporting Airflow versions < 2.3, we backport "get_hook()" method. This should be removed
+    when "apache-airflow-providers-slack" will depend on Airflow >= 2.3.

Review Comment:
   No particular reason. Just a mistake :) . I think it would be great to add it :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first [airflow]

Posted by "shlomiken (via GitHub)" <gi...@apache.org>.
shlomiken commented on PR #26944:
URL: https://github.com/apache/airflow/pull/26944#issuecomment-1842494020

   @potiuk - looks like this PR breaks get_first and get_records return type for snowflake for example , so instead of tuple we get a Dict.
   this also breaks SqlSensor 
   
   this code - was getting the default cursor from the connection (`SnowFlakeCursor`) - but now is in the hands of the specific hook  , which in snowflake case return a `DictCursor`
   
   ```
    with closing(conn.cursor()) as cur:
                   results = []
                   for sql_statement in sql:
                       self._run_command(cur, sql_statement, parameters)
   
                       if handler is not None:
                           result = handler(cur)
                           results.append(result)
     ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first [airflow]

Posted by "shlomiken (via GitHub)" <gi...@apache.org>.
shlomiken commented on PR #26944:
URL: https://github.com/apache/airflow/pull/26944#issuecomment-1843099193

   @potiuk - thanks , i kinda figure that out after trying to play with different versions of common-sql and provider.
   what surprised me is that we used to get a tuple, then after upgrading some of the providers we got dict. 
   and now latest version basically support both if a hook parameter is passed for returnning dictionary - meaning default is back to tuple. - what a mess:-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org