You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/29 20:08:46 UTC

[GitHub] [airflow] kazanzhy opened a new pull request, #25412: Fixing JdbcOperator non-SELECT statement run

kazanzhy opened a new pull request, #25412:
URL: https://github.com/apache/airflow/pull/25412

   I added a very unpleasant bug in #23817 to close #19313.
   It was adding `fetchall` to JdbcHook in JdbcOperator.
   But error raises when running not select statements as described in #25388
   
   My main question for other maintainers is which solution is better
   - adding parameter `fetch_results: bool` 
   - give users the ability to pass their own handlers
   
   Anyway, for more compatibility and unification we have to decide the default behavior of SQL operators in terms of returning results.
   
   My opinion is adding `handler` parameter and no Xcom push by default, like:
   ``handler: Optional[Callable] = None``


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25412:
URL: https://github.com/apache/airflow/pull/25412#issuecomment-1207366904

   > Is that planned ? I am asking after seeing this commit's message: "move all old sql operator...": [acab8f5](https://github.com/apache/airflow/commit/acab8f52dd8d90fd6583779127895dd343780f79)
   
   Maybe - if you want to add such refactor - feel free :)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kazanzhy commented on pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on PR #25412:
URL: https://github.com/apache/airflow/pull/25412#issuecomment-1199911458

   Hi @cdabella @nkyuray
   Probably, changes that fixed #19313 will be rolled back but with adding some flexibility


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] FanatoniQ commented on pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
FanatoniQ commented on PR #25412:
URL: https://github.com/apache/airflow/pull/25412#issuecomment-1207365681

   > Yep. I think it's not technically needed now - the effect is the same, though the difference is that it allows to override the handler in JDBC, which might be a useful thing.
   
   Ok I agree.
   
   Quick question: shouldn't this handler parameter, and sql operator run methods return value feature used alongside `do_xcom_push` be coded in a common parent SQL class ? I found the other day that `MsSqlOperator` and `MySqlOperator` run methods do not return the result...
   
   Is that planned ? I am asking after seeing this commit's message: "move all old sql operator...":  https://github.com/apache/airflow/commit/acab8f52dd8d90fd6583779127895dd343780f79


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kazanzhy commented on a diff in pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on code in PR #25412:
URL: https://github.com/apache/airflow/pull/25412#discussion_r934972035


##########
airflow/providers/jdbc/operators/jdbc.py:
##########
@@ -57,16 +57,21 @@ def __init__(
         jdbc_conn_id: str = 'jdbc_default',
         autocommit: bool = False,
         parameters: Optional[Union[Iterable, Mapping]] = None,
+        fetch_results: bool = False,
         **kwargs,
     ) -> None:
         super().__init__(**kwargs)
         self.parameters = parameters
         self.sql = sql
         self.jdbc_conn_id = jdbc_conn_id
         self.autocommit = autocommit
+        self.fetch_results = fetch_results
         self.hook = None
 
     def execute(self, context: 'Context'):
         self.log.info('Executing: %s', self.sql)
         hook = JdbcHook(jdbc_conn_id=self.jdbc_conn_id)
-        return hook.run(self.sql, self.autocommit, parameters=self.parameters, handler=fetch_all_handler)
+        if self.fetch_results:

Review Comment:
   No, you're right. It's I missed that no additional parameter needed if `do_xcom_push` is already present



##########
airflow/providers/jdbc/operators/jdbc.py:
##########
@@ -57,16 +57,21 @@ def __init__(
         jdbc_conn_id: str = 'jdbc_default',
         autocommit: bool = False,
         parameters: Optional[Union[Iterable, Mapping]] = None,
+        fetch_results: bool = False,
         **kwargs,
     ) -> None:
         super().__init__(**kwargs)
         self.parameters = parameters
         self.sql = sql
         self.jdbc_conn_id = jdbc_conn_id
         self.autocommit = autocommit
+        self.fetch_results = fetch_results
         self.hook = None
 
     def execute(self, context: 'Context'):
         self.log.info('Executing: %s', self.sql)
         hook = JdbcHook(jdbc_conn_id=self.jdbc_conn_id)
-        return hook.run(self.sql, self.autocommit, parameters=self.parameters, handler=fetch_all_handler)
+        if self.fetch_results:

Review Comment:
   No, you're right. I missed that no additional parameter is needed if `do_xcom_push` is already present



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #25412:
URL: https://github.com/apache/airflow/pull/25412#discussion_r935125171


##########
airflow/providers/jdbc/operators/jdbc.py:
##########
@@ -57,16 +57,21 @@ def __init__(
         jdbc_conn_id: str = 'jdbc_default',
         autocommit: bool = False,
         parameters: Optional[Union[Iterable, Mapping]] = None,
+        handler: Callable = fetch_all_handler,

Review Comment:
   ```suggestion
           handler: Callable[[CursorResult], Optional[List[Tuple]]] = fetch_all_handler,
   ```
   
   (plus needed imports)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] FanatoniQ commented on a diff in pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
FanatoniQ commented on code in PR #25412:
URL: https://github.com/apache/airflow/pull/25412#discussion_r935171435


##########
airflow/providers/jdbc/operators/jdbc.py:
##########
@@ -57,16 +57,21 @@ def __init__(
         jdbc_conn_id: str = 'jdbc_default',
         autocommit: bool = False,
         parameters: Optional[Union[Iterable, Mapping]] = None,
+        handler: Callable = fetch_all_handler,

Review Comment:
   I dont think we should type hint here since the fetchall handler must not be sqlalchemy but bdapi2 compliant (`DbApiHook.run` does not use sqlalchemy).
   
   Does the `CursorResult` import comes from `jaydebeapi` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kazanzhy commented on a diff in pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on code in PR #25412:
URL: https://github.com/apache/airflow/pull/25412#discussion_r935788196


##########
airflow/providers/jdbc/operators/jdbc.py:
##########
@@ -57,16 +57,21 @@ def __init__(
         jdbc_conn_id: str = 'jdbc_default',
         autocommit: bool = False,
         parameters: Optional[Union[Iterable, Mapping]] = None,
+        handler: Callable = fetch_all_handler,

Review Comment:
   Yes, `Callable[[Any], Any]` should be enough, because initially, I thought about:
   
   ```
   from airflow.typing_compat import Protocol
   
   class Cursor(Protocol):
       description: Optional[Tuple]
       fetchall: Callable[[], List[Tuple]]
   
   handler: Callable[[Optional[List[Tuple]]], Cursor]
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25412:
URL: https://github.com/apache/airflow/pull/25412#issuecomment-1207363702

   Yep. I think it's not technically needed now - the effect is the same, though the difference is that it allows to override the handler in JDBC, which might be a useful thing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #25412:
URL: https://github.com/apache/airflow/pull/25412#discussion_r935180155


##########
airflow/providers/jdbc/operators/jdbc.py:
##########
@@ -57,16 +57,21 @@ def __init__(
         jdbc_conn_id: str = 'jdbc_default',
         autocommit: bool = False,
         parameters: Optional[Union[Iterable, Mapping]] = None,
+        handler: Callable = fetch_all_handler,

Review Comment:
   It should probably be `Callable[[Any], Any]` then? The goal is to make sure the handler accepts one argument, and returns something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #25412:
URL: https://github.com/apache/airflow/pull/25412#discussion_r935180155


##########
airflow/providers/jdbc/operators/jdbc.py:
##########
@@ -57,16 +57,21 @@ def __init__(
         jdbc_conn_id: str = 'jdbc_default',
         autocommit: bool = False,
         parameters: Optional[Union[Iterable, Mapping]] = None,
+        handler: Callable = fetch_all_handler,

Review Comment:
   It should probably be `Callable[[Any], Any]` then? The goal is to make sure the handler accepts exactly one argument, and returns something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
josh-fell commented on PR #25412:
URL: https://github.com/apache/airflow/pull/25412#issuecomment-1200042069

   >My main question for other maintainers is which solution is better
   > - adding parameter fetch_results: bool
   > - give users the ability to pass their own handlers
   
   It might not be what is _better_, but perhaps why not do both? Have users control if they want to fetch results from the query they execute and the option to customize how to handle said results (with `fetch_all_handler` as a default even).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #25412:
URL: https://github.com/apache/airflow/pull/25412


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] FanatoniQ commented on pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
FanatoniQ commented on PR #25412:
URL: https://github.com/apache/airflow/pull/25412#issuecomment-1207362466

   @potiuk @kazanzhy Didn't we fix #25388 with #25430 ? `fetch_all_handler` should be fixed now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #25412:
URL: https://github.com/apache/airflow/pull/25412#discussion_r935125171


##########
airflow/providers/jdbc/operators/jdbc.py:
##########
@@ -57,16 +57,21 @@ def __init__(
         jdbc_conn_id: str = 'jdbc_default',
         autocommit: bool = False,
         parameters: Optional[Union[Iterable, Mapping]] = None,
+        handler: Callable = fetch_all_handler,

Review Comment:
   ```suggestion
           handler: Callable[[CursorResult], Any] = fetch_all_handler,
   ```
   
   (plus needed imports)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kazanzhy commented on pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on PR #25412:
URL: https://github.com/apache/airflow/pull/25412#issuecomment-1199912104

   @akashgangulyhf this PR is devoted to #25388. Please join to discussions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #25412:
URL: https://github.com/apache/airflow/pull/25412#discussion_r933764651


##########
airflow/providers/jdbc/operators/jdbc.py:
##########
@@ -57,16 +57,21 @@ def __init__(
         jdbc_conn_id: str = 'jdbc_default',
         autocommit: bool = False,
         parameters: Optional[Union[Iterable, Mapping]] = None,
+        fetch_results: bool = False,
         **kwargs,
     ) -> None:
         super().__init__(**kwargs)
         self.parameters = parameters
         self.sql = sql
         self.jdbc_conn_id = jdbc_conn_id
         self.autocommit = autocommit
+        self.fetch_results = fetch_results
         self.hook = None
 
     def execute(self, context: 'Context'):
         self.log.info('Executing: %s', self.sql)
         hook = JdbcHook(jdbc_conn_id=self.jdbc_conn_id)
-        return hook.run(self.sql, self.autocommit, parameters=self.parameters, handler=fetch_all_handler)
+        if self.fetch_results:

Review Comment:
   Why do we need this parameter?
   This is operator not hook so it looks like fetch_results = do_xcom_push ?
   Did I miss something?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kazanzhy commented on a diff in pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on code in PR #25412:
URL: https://github.com/apache/airflow/pull/25412#discussion_r935788196


##########
airflow/providers/jdbc/operators/jdbc.py:
##########
@@ -57,16 +57,21 @@ def __init__(
         jdbc_conn_id: str = 'jdbc_default',
         autocommit: bool = False,
         parameters: Optional[Union[Iterable, Mapping]] = None,
+        handler: Callable = fetch_all_handler,

Review Comment:
   Yes, `Callable[[Any], Any]` should be enough, because initially, I thought about:
   
   ```
   from airflow.typing_compat import Protocol
   
   class Cursor(Protocol):
       description: Optional[Tuple]
       fetchall: Callable[[], List[Tuple]]
   
   handler: Callable[[Cursor], Optional[List[Tuple]]]
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25412:
URL: https://github.com/apache/airflow/pull/25412#issuecomment-1207360844

   Some intermittent issues. Re-running.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kazanzhy commented on pull request #25412: Fixing JdbcOperator non-SELECT statement run

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on PR #25412:
URL: https://github.com/apache/airflow/pull/25412#issuecomment-1207384199

   @FanatoniQ I'm already working on https://github.com/apache/airflow/issues/25259.
   It might answer your question :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org