You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/04 20:39:33 UTC

[GitHub] [airflow] potiuk opened a new pull request #17423: Make schema in DBApiHook private

potiuk opened a new pull request #17423:
URL: https://github.com/apache/airflow/pull/17423


   There was a change in #16521 that introduced schema field in
   DBApiHook, but unfortunately using it in provider Hooks deriving
   from DBApiHook is backwards incompatible for Airflow 2.1 and below.
   
   This caused Postgres 2.1.0 release backwards incompatibility and
   failures for Airflow 2.1.0.
   
   Since the change is small and most of DBApi-derived hooks already
   set the schema field on their own, the best approach is to
   make the schema field private for the DBApiHook and make a change
   in Postgres Hook to store the schema in the same way as all other
   operators.
   
   Fixes: #17422
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682957637



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Instead, maybe this:
   ```suggestion
           if not hasattr(self, "schema"):
               self.schema: Optional[str] = kwargs.pop("schema", None)
   ```
   
   That should work for core both pre and post 2.2.0 and wouldn't break behavior of `get_uri` either like doing `__schema` does.

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Good catch. One thing I don't love is now we possibly have both `__schema` and `schema` being used. It's particularly ugly if one wanted to change the schema after init. What if we just let the hooks also set `schema`?

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Sorry, I should have included an example. This is what I meant:
   
   ```
   h = SomeHook(schema="foo")
   h.get_uri() # uses foo
   h.schema = "bar"
   h.get_uri() # still uses foo because of __schema
   ```
   
   That's why I'm thinking letting both `DbApiHook` and any derived hooks both set `schema` might be the best of both worlds here? Then the derived hooks could stop setting `schema` in Airflow 3 (or the next time they get a min core version bump really).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#issuecomment-893288824


   Fixed all the problems - looking for other's opinion of what will be the best solution to DBApiHook modifications vs. Providers.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682957637



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Instead, maybe this:
   ```suggestion
           if not hasattr(self, "schema"):
               self.schema: Optional[str] = kwargs.pop("schema", None)
   ```
   
   That should work for core both pre and post 2.2.0 and wouldn't break behavior of `get_uri` either like doing `__schema` does.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682966113



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       The problem is that having `schema` as attribute in DBApiHook will encourage people to use it in their other Hooks deriving from DBApiHook. It will be rather difficult to make sure that everyone is using it in "hasattr" way. Even if we two remember it now, we will forget about it and other committers who are not involved will not even know about this. We could potentially automate a pre-commit check if the "DBApiHook's schema" is accessed with hasattr, but I think we will never be able to do it with 100% accuracy and I think it's simply not worth it for that single field, that can be simply replaced with single line for each operator that wants to use it (in __init__):
   
   ```
           self.schema: Optional[str] = kwargs.pop("schema", None)
   ```
   
   I think we should think about DBApiHook as "Public" API of Airflow and any change to it should be very, very carefully considered.
   
   Another option would be to add `>=Airflow 2.2` limitation to Postgres operator (and any other operator that uses it), but again I think sacrificing backwards compatibility in this case is simply not worth it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682966113



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       The problem is that having `schema` as attribute in DBApiHook will encourage people to use it in their other Hooks deriving from DBApiHook. It will be rather difficult to make sure that everyone is using it in "hasattr" way. Even if we two remember it now, we will forget about it and other committers who are not involved will not even know about this. We could potentially automate a pre-commit check if the "DBApiHook" is accessed with hasattr, but I think we will never be able to do it with 100% accuracy and I think it's simply not worth it for that single field, that can be simply replaced with single line for each operator that wants to use it (in __init__):
   
   ```
           self.schema: Optional[str] = kwargs.pop("schema", None)
   ```
   
   I think we should think about DBApiHook as "Public" API of Airflow and any change to it should be very, very carefully considered.
   
   Another option would be to add `>=Airflow 2.2` limitation to Postgres operator (and any other operator that uses it), but again I think sacrificing backwards compatibility in this case is simply not worth it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682966113



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       The problem is that having `schema` as attribute in DBApiHook will encourage people to use it in their other operators deriving from DBApi. It will be rather difficult to make sure that everyone is using it in "hasattr" way. Even if we two remember it now, we will forget about it and other committers who are not involved will not even know about this. We could potentially automate a pre-commit check if the "DBApiHook" is accessed with hasattr, but I think we will never be able to do it with 100% accuracy and I think it's simply not worth it for that single field, that can be simply replaced with single line for each operator that wants to use it (in __init__):
   
   ```
           self.schema: Optional[str] = kwargs.pop("schema", None)
   ```
   
   I think we should think about DBApiHook as "Public" API of Airflow and any change to it should be very, very carefully considered.
   
   Another option would be to add `>=Airflow 2.2` limitation to Postgres operator (and any other operator that uses it), but again I think sacrificing backwards compatibility in this case is simply not worth it.

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       The problem is that having `schema` as attribute in DBApiHook will encourage people to use it in their other Hooks deriving from DBApiHook. It will be rather difficult to make sure that everyone is using it in "hasattr" way. Even if we two remember it now, we will forget about it and other committers who are not involved will not even know about this. We could potentially automate a pre-commit check if the "DBApiHook" is accessed with hasattr, but I think we will never be able to do it with 100% accuracy and I think it's simply not worth it for that single field, that can be simply replaced with single line for each operator that wants to use it (in __init__):
   
   ```
           self.schema: Optional[str] = kwargs.pop("schema", None)
   ```
   
   I think we should think about DBApiHook as "Public" API of Airflow and any change to it should be very, very carefully considered.
   
   Another option would be to add `>=Airflow 2.2` limitation to Postgres operator (and any other operator that uses it), but again I think sacrificing backwards compatibility in this case is simply not worth it.

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       The problem is that having `schema` as attribute in DBApiHook will encourage people to use it in their other Hooks deriving from DBApiHook. It will be rather difficult to make sure that everyone is using it in "hasattr" way. Even if we two remember it now, we will forget about it and other committers who are not involved will not even know about this. We could potentially automate a pre-commit check if the "DBApiHook's schema" is accessed with hasattr, but I think we will never be able to do it with 100% accuracy and I think it's simply not worth it for that single field, that can be simply replaced with single line for each operator that wants to use it (in __init__):
   
   ```
           self.schema: Optional[str] = kwargs.pop("schema", None)
   ```
   
   I think we should think about DBApiHook as "Public" API of Airflow and any change to it should be very, very carefully considered.
   
   Another option would be to add `>=Airflow 2.2` limitation to Postgres operator (and any other operator that uses it), but again I think sacrificing backwards compatibility in this case is simply not worth it.

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Ah actually I see an error .in my solution.. I should not do "pop" in the args :)

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Removed "pop"  from DBApi to leave it for other hooks. 
   
   Actually I just realized the way it was implemented before - with pop() had undesired side effect for Hooks that already used  kwargs.pop() .The side effect was that kwargs.pop() in DBApiHook would remove the schema from kwargs in derived classes and their `self.schema = kwargs.pop("schema", None)` would override the schema to None (!)
   
   Example: mysql:
    
   https://github.com/apache/airflow/blob/main/airflow/providers/mysql/hooks/mysql.py#L61
   
   So in fact, the original change (not yet released luckily) in DBApiHook was even more disruptive than the failed `PostgresHook`. I am actually glad it was uncovered now, as it would be far more disruptive it was released in Airflow.
   
   That's why we should be **extremely** careful with changing DBApiHook (and BaseHook and similar).
   

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Removed "pop"  from DBApi to leave it for other hooks. 
   
   Actually I just realized the way it was implemented before - with pop() had undesired side effect for Hooks that already used  kwargs.pop() .The side effect was that kwargs.pop() in DBApiHook would remove the `schema` from kwargs in derived classes and their `self.schema = kwargs.pop("schema", None)` would override the schema to None (!)
   
   Example: mysql:
    
   https://github.com/apache/airflow/blob/main/airflow/providers/mysql/hooks/mysql.py#L61
   
   So in fact, the original change (not yet released luckily) in DBApiHook was even more disruptive than the failed `PostgresHook`. I am actually glad it was uncovered now, as it would be far more disruptive it was released in Airflow.
   
   That's why we should be **extremely** careful with changing DBApiHook (and BaseHook and similar).
   

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       😱 🙀 
   

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       I think this is unlikely edge case (much less likely than use of schema when it will be defined in DBApiHook). I hardly see the use for that. Why somone would like to define a DBApi -derived hook and pass a 'schema' parameter to it while also changing it in it's own `__init__`? Seems extremely unlikely, also It feels natural that in such case the change should be done before `super.__init__()` rather than after.   
   
   We can assume that "schema" is our convention for kwarg para that we should use for all DB Hooks.  We can standardise it (it's already commonly used)  and release in Airflow 3 DBApiHook I think. 
   
   For now we might want to add some more docs/comments explaining it and some Airflow 3 notice about it. WDYT? 

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       I think this is unlikely edge case (much less likely than use of schema when it will be defined in DBApiHook as public field). I hardly see the use for that. Why somone would like to define a DBApi -derived hook and pass a 'schema' parameter to it while also changing it in it's own `__init__`? Seems extremely unlikely, also It feels natural that in such case the change should be done before `super.__init__()` rather than after.   
   
   We can assume that "schema" is our convention for kwarg para that we should use for all DB Hooks.  We can standardise it (it's already commonly used)  and release in Airflow 3 DBApiHook I think. 
   
   For now we might want to add some more docs/comments explaining it and some Airflow 3 notice about it. WDYT? 

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Right - I think i fiigured out better way of doing it. I've added the 'schema' argument explicitly as optional kwarg to DBApiHook and made a comment about the "change schema value" in the description of the parameter.
   
   I think it's much better - it makes schema an explicit part of the DBHookAPI hook's API, it is fully backwards compatible and it makes it very easy for Airflow 3 change - we will simply make the self.schema public and convert all the operators that use it in the "old" way.  See the latest fixup.

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       > That's why I'm thinking letting both DbApiHook and any derived hooks both set schema might be the best of both worlds here? Then the derived hooks could stop setting schema in Airflow 3 (or the next time they get a min core version bump really).
   
   The problem with this is - that it has already happened that we overlooked that the `DBApiHook.schema` object has been accessed directly by a provider, which made it backwards incompatible with Airflow 2.1. We do not have any mechanism to prevents this is in the future again, if we have it as a public field in DBApiHook. This is a bit of a problem that DBAPiHook is "public API" part and by making a public field in this class, we change the API.
   
   The way I proposed - in the last fixup, `schema` becomes part of the DBApi 'initialization' API. If any other hook stores the schema field as self.schema - so be it, it is "its own responsibiliity" - we make it clear now in the constructor of the DBApiHook that passing "schema" as kwargs is THE way how to override the schema, and by not having a public field, we clearly say that the deriving hook cannot expect that it can change it and expect "DBApiHook" getUri() method will use it.

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       > That's why I'm thinking letting both DbApiHook and any derived hooks both set schema might be the best of both worlds here? Then the derived hooks could stop setting schema in Airflow 3 (or the next time they get a min core version bump really).
   
   The problem with this is - that it has already happened that we overlooked that the `DBApiHook.schema` object has been accessed directly by a provider, which made it backwards incompatible with Airflow 2.1. We do not have any mechanism to prevents this is in the future again, if we have it as a public field in DBApiHook. This is a bit of a problem that DBAPiHook is "public API" part and by making a public field in this class, we change the API.
   
   The way I proposed - in the last fixup, `schema` becomes part of the DBApiHook 'initialization' API. If any other hook stores the schema field as self.schema - so be it, it is "its own responsibiliity" - we make it clear now in the constructor of the DBApiHook that passing "schema" as kwargs is THE way how to override the schema, and by not having a public field, we clearly say that the deriving hook cannot expect that it can change it and expect "DBApiHook" getUri() method will use it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r683017452



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       > That's why I'm thinking letting both DbApiHook and any derived hooks both set schema might be the best of both worlds here? Then the derived hooks could stop setting schema in Airflow 3 (or the next time they get a min core version bump really).
   
   The problem with this is - that it has already happened that we overlooked that the `DBApiHook.schema` object has been accessed directly by a provider, which made it backwards incompatible with Airflow 2.1. We do not have any mechanism to prevents this is in the future again, if we have it as a public field in DBApiHook. This is a bit of a problem that DBAPiHook is "public API" part and by making a public field in this class, we change the API.
   
   The way I proposed - in the last fixup, `schema` becomes part of the DBApi 'initialization' API. If any other hook stores the schema field as self.schema - so be it, it is "its own responsibiliity" - we make it clear now in the constructor of the DBApiHook that passing "schema" as kwargs is THE way how to override the schema, and by not having a public field, we clearly say that the deriving hook cannot expect that it can change it and expect "DBApiHook" getUri() method will use it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682966113



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       The problem is that having `schema` as attribute in DBApiHook will encourage people to use it in their other operators deriving from DBApi. It will be rather difficult to make sure that everyone is using it in "hasattr" way. Even if we two remember it now, we will forget about it and other committers who are not involved will not even know about this. We could potentially automate a pre-commit check if the "DBApiHook" is accessed with hasattr, but I think we will never be able to do it with 100% accuracy and I think it's simply not worth it for that single field, that can be simply replaced with single line for each operator that wants to use it (in __init__):
   
   ```
           self.schema: Optional[str] = kwargs.pop("schema", None)
   ```
   
   I think we should think about DBApiHook as "Public" API of Airflow and any change to it should be very, very carefully considered.
   
   Another option would be to add `>=Airflow 2.2` limitation to Postgres operator (and any other operator that uses it), but again I think sacrificing backwards compatibility in this case is simply not worth it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r683009731



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Sorry, I should have included an example. This is what I meant:
   
   ```
   h = SomeHook(schema="foo")
   h.get_uri() # uses foo
   h.schema = "bar"
   h.get_uri() # still uses foo because of __schema
   ```
   
   That's why I'm thinking letting both `DbApiHook` and any derived hooks both set `schema` might be the best of both worlds here? Then the derived hooks could stop setting `schema` in Airflow 3 (or the next time they get a min core version bump really).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r683012725



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Right - I think i fiigured out better way of doing it. I've added the 'schema' argument explicitly as optional kwarg to DBApiHook and made a comment about the "change schema value" in the description of the parameter.
   
   I think it's much better - it makes schema an explicit part of the DBHookAPI hook's API, it is fully backwards compatible and it makes it very easy for Airflow 3 change - we will simply make the self.schema public and convert all the operators that use it in the "old" way.  See the latest fixup.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #17423:
URL: https://github.com/apache/airflow/pull/17423


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#issuecomment-894684457


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682983517



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Removed "pop"  from DBApi to leave it for other hooks. 
   
   Actually I just realized the way it was implemented before - with pop() had undesired side effect for Hooks that already used  kwargs.pop() .The side effect was that kwargs.pop() in DBApiHook would remove the schema from kwargs in derived classes and their `self.schema = kwargs.pop("schema", None)` would override the schema to None (!)
   
   Example: mysql:
    
   https://github.com/apache/airflow/blob/main/airflow/providers/mysql/hooks/mysql.py#L61
   
   So in fact, the original change (not yet released luckily) in DBApiHook was even more disruptive than the failed `PostgresHook`. I am actually glad it was uncovered now, as it would be far more disruptive it was released in Airflow.
   
   That's why we should be **extremely** careful with changing DBApiHook (and BaseHook and similar).
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682983517



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Removed "pop"  from DBApi to leave it for other hooks. 
   
   Actually I just realized the way it was implemented before - with pop() had undesired side effect for Hooks that already used  kwargs.pop() .The side effect was that kwargs.pop() in DBApiHook would remove the `schema` from kwargs in derived classes and their `self.schema = kwargs.pop("schema", None)` would override the schema to None (!)
   
   Example: mysql:
    
   https://github.com/apache/airflow/blob/main/airflow/providers/mysql/hooks/mysql.py#L61
   
   So in fact, the original change (not yet released luckily) in DBApiHook was even more disruptive than the failed `PostgresHook`. I am actually glad it was uncovered now, as it would be far more disruptive it was released in Airflow.
   
   That's why we should be **extremely** careful with changing DBApiHook (and BaseHook and similar).
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682984695



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       😱 🙀 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#issuecomment-893288824


   Fixed all the problems - looking for other's opinion of what will be the best solution to DBApiHook modifications vs. Providers.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r683000067



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       I think this is unlikely edge case (much less likely than use of schema when it will be defined in DBApiHook). I hardly see the use for that. Why somone would like to define a DBApi -derived hook and pass a 'schema' parameter to it while also changing it in it's own `__init__`? Seems extremely unlikely, also It feels natural that in such case the change should be done before `super.__init__()` rather than after.   
   
   We can assume that "schema" is our convention for kwarg para that we should use for all DB Hooks.  We can standardise it (it's already commonly used)  and release in Airflow 3 DBApiHook I think. 
   
   For now we might want to add some more docs/comments explaining it and some Airflow 3 notice about it. WDYT? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r684183034



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Any other  comments? I think We will not release an out-of-band Postgres operator, so this is something we will have to solve mid-August, but would be good to get some opinions :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r683000067



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       I think this is unlikely edge case (much less likely than use of schema when it will be defined in DBApiHook as public field). I hardly see the use for that. Why somone would like to define a DBApi -derived hook and pass a 'schema' parameter to it while also changing it in it's own `__init__`? Seems extremely unlikely, also It feels natural that in such case the change should be done before `super.__init__()` rather than after.   
   
   We can assume that "schema" is our convention for kwarg para that we should use for all DB Hooks.  We can standardise it (it's already commonly used)  and release in Airflow 3 DBApiHook I think. 
   
   For now we might want to add some more docs/comments explaining it and some Airflow 3 notice about it. WDYT? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682967313



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Ah actually I see an error .in my solution.. I should not do "pop" in the args :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r683017452



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       > That's why I'm thinking letting both DbApiHook and any derived hooks both set schema might be the best of both worlds here? Then the derived hooks could stop setting schema in Airflow 3 (or the next time they get a min core version bump really).
   
   The problem with this is - that it has already happened that we overlooked that the `DBApiHook.schema` object has been accessed directly by a provider, which made it backwards incompatible with Airflow 2.1. We do not have any mechanism to prevents this is in the future again, if we have it as a public field in DBApiHook. This is a bit of a problem that DBAPiHook is "public API" part and by making a public field in this class, we change the API.
   
   The way I proposed - in the last fixup, `schema` becomes part of the DBApiHook 'initialization' API. If any other hook stores the schema field as self.schema - so be it, it is "its own responsibiliity" - we make it clear now in the constructor of the DBApiHook that passing "schema" as kwargs is THE way how to override the schema, and by not having a public field, we clearly say that the deriving hook cannot expect that it can change it and expect "DBApiHook" getUri() method will use it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #17423: Make schema in DBApiHook private

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682987929



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Good catch. One thing I don't love is now we possibly have both `__schema` and `schema` being used. It's particularly ugly if one wanted to change the schema after init. What if we just let the hooks also set `schema`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org