You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/06 21:05:34 UTC

[GitHub] [airflow] kazanzhy opened a new pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

kazanzhy opened a new pull request #21307:
URL: https://github.com/apache/airflow/pull/21307


   Attribute `description` of simple psycopg2 cursor is available after `.execute()` method.  
   But for the named cursor (server-side cursor) any `.fetch*` method call is needed.
   
   In case of an empty result set like `.execute("SELECT 1 LIMIT 0")`:
   - for the simple cursor, there are initialized `.description` and an empty iterator
   - for the server-side cursor, there are not initialized `.description`and an empty original iterator.
   But also we have `self.rows=[None]`, as a result of `fetchone()` on an empty iterator of the server-side cursor. 
   This `None` makes `_PostgresServerSideCursorDecorator` non-empty iterator.
   
   ~~I decided to remove `rows` attribute because it could consist of a maximum of one row, and also psycopg2 server-side cursor have `.scroll()` method and it is safe to call `.scroll()` on empty resultset even without `.fetchone()`~~
   
   closes #20007
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #21307:
URL: https://github.com/apache/airflow/pull/21307


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy edited a comment on pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
kazanzhy edited a comment on pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#issuecomment-1030223628


   Today I made few tests:
   
   Current version (code from `main` branch):
   - Query returns rows and `use_server_side_cursor=False`: file with data
   - Query returns rows and `use_server_side_cursor=True`: file with data
   - Query returns nothing and `use_server_side_cursor=False`: empty file
   - Query return nothing and `use_server_side_cursor=True`: ERROR
   
   Updated version (this PR code):
   - Query returns rows and `use_server_side_cursor=False`: file with data
   - Query returns rows and `use_server_side_cursor=True`: file with data
   - Query returns nothing and `use_server_side_cursor=False`: empty file
   - Query return nothing and `use_server_side_cursor=True`: empty file
   
   The last one is not what is expected in the issue but I think creating the empty file is the right behavior
   > I'm expected, that operator on empty table not creating file and no upload it on Google Cloud.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#issuecomment-1029594042


   > Actually, I think `_PostgresServerSideCursorDecorator` could be removed if
   
   Are these existing tests to verify the behaviour of `use_server_side_cursor`? This suggested change makes sense to me but maybe I missed some cases expected by users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy commented on pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#issuecomment-1030223628


   Today I made few tests:
   
   Current version (code from `main` branch):
   - Query returns rows and use_server_side_cursor=False: file with data
   - Query returns rows and use_server_side_cursor=True: file with data
   - Query returns nothing and use_server_side_cursor=False: empty file
   - Query return nothing and use_server_side_cursor=True: ERROR
   
   Updated version (this PR code):
   - Query returns rows and use_server_side_cursor=False: file with data
   - Query returns rows and use_server_side_cursor=True: file with data
   - Query returns nothing and use_server_side_cursor=False: empty file
   - Query return nothing and use_server_side_cursor=True: empty file
   
   The last one is not what is expected in the issue but I think creating the empty file is the right behavior
   > I'm expected, that operator on empty table not creating file and no upload it on Google Cloud.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#discussion_r799129172



##########
File path: airflow/providers/google/cloud/transfers/postgres_to_gcs.py
##########
@@ -41,26 +41,22 @@ class _PostgresServerSideCursorDecorator:
 
     def __init__(self, cursor):
         self.cursor = cursor
-        self.rows = []
         self.initialized = False
 
     def __iter__(self):
         return self
 
     def __next__(self):
-        if self.rows:
-            return self.rows.pop()
-        else:
-            self.initialized = True
-            return next(self.cursor)
+        self.initialized = True
+        return next(self.cursor)
 
     @property
     def description(self):
         """Fetch first row to initialize cursor description when using server side cursor."""
         if not self.initialized:
-            element = self.cursor.fetchone()
-            self.rows.append(element)
             self.initialized = True
+            self.cursor.fetchone()
+            self.cursor.scroll(0, mode='absolute')

Review comment:
       I like your current approach better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy edited a comment on pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
kazanzhy edited a comment on pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#issuecomment-1029413486


   ~~Actually, I think `_PostgresServerSideCursorDecorator` could be removed if `PostgresToGCSOperator.query` will be modified in this way:~~
   ```
   ...
   if self.use_server_side_cursor:
       cursor.itersize = self.cursor_itersize
       cursor.fetchone()
       cursor.scroll(0, mode='absolute')
    return cursor
    ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #21307:
URL: https://github.com/apache/airflow/pull/21307


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy edited a comment on pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
kazanzhy edited a comment on pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#issuecomment-1030223628


   Today I made few tests:
   
   Current version (code from `main` branch):
   - Query returns rows and `use_server_side_cursor=False`: file with data
   - Query returns rows and `use_server_side_cursor=True`: file with data
   - Query returns nothing and `use_server_side_cursor=False`: empty file
   - Query returns nothing and `use_server_side_cursor=True`: ERROR
   
   Updated version (this PR code):
   - Query returns rows and `use_server_side_cursor=False`: file with data
   - Query returns rows and `use_server_side_cursor=True`: file with data
   - Query returns nothing and `use_server_side_cursor=False`: empty file
   - Query returns nothing and `use_server_side_cursor=True`: empty file
   
   The last one is not what is expected in the issue but I think creating the empty file is the right behavior
   > I'm expected, that operator on empty table not creating file and no upload it on Google Cloud.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy edited a comment on pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
kazanzhy edited a comment on pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#issuecomment-1029413486






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy edited a comment on pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
kazanzhy edited a comment on pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#issuecomment-1029413486


   ~~Actually, I think _PostgresServerSideCursorDecorator could be removed if PostgresToGCSOperator.query will be modified in this way: ~~
   ```
   ...
   if self.use_server_side_cursor:
       cursor.itersize = self.cursor_itersize
       cursor.fetchone()
       cursor.scroll(0, mode='absolute')
    return cursor
    ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #21307:
URL: https://github.com/apache/airflow/pull/21307


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy commented on a change in pull request #21307: Fixed error in PostgresToGCSOperator on the empty resultset

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on a change in pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#discussion_r798974645



##########
File path: airflow/providers/google/cloud/transfers/postgres_to_gcs.py
##########
@@ -41,26 +41,22 @@ class _PostgresServerSideCursorDecorator:
 
     def __init__(self, cursor):
         self.cursor = cursor
-        self.rows = []
         self.initialized = False
 
     def __iter__(self):
         return self
 
     def __next__(self):
-        if self.rows:
-            return self.rows.pop()
-        else:
-            self.initialized = True
-            return next(self.cursor)
+        self.initialized = True
+        return next(self.cursor)
 
     @property
     def description(self):
         """Fetch first row to initialize cursor description when using server side cursor."""
         if not self.initialized:
-            element = self.cursor.fetchone()
-            self.rows.append(element)
             self.initialized = True
+            self.cursor.fetchone()
+            self.cursor.scroll(0, mode='absolute')

Review comment:
       If `.scroll(-1)` is more clear then I change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy commented on pull request #21307: Fixed error in PostgresToGCSOperator on the empty resultset

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#issuecomment-1029413486


   Actually, I think `_PostgresServerSideCursorDecorator` could be removed if `PostgresToGCSOperator.query` will be modified in this way:
   
   ```
   ...
   if self.use_server_side_cursor:
       cursor.itersize = self.cursor_itersize
       cursor.fetchone()
       cursor.scroll(0, mode='absolute')
    return cursor
    ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy commented on a change in pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on a change in pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#discussion_r799663382



##########
File path: airflow/providers/google/cloud/transfers/postgres_to_gcs.py
##########
@@ -41,26 +41,22 @@ class _PostgresServerSideCursorDecorator:
 
     def __init__(self, cursor):
         self.cursor = cursor
-        self.rows = []
         self.initialized = False
 
     def __iter__(self):
         return self
 
     def __next__(self):
-        if self.rows:
-            return self.rows.pop()
-        else:
-            self.initialized = True
-            return next(self.cursor)
+        self.initialized = True
+        return next(self.cursor)
 
     @property
     def description(self):
         """Fetch first row to initialize cursor description when using server side cursor."""
         if not self.initialized:
-            element = self.cursor.fetchone()
-            self.rows.append(element)
             self.initialized = True
+            self.cursor.fetchone()
+            self.cursor.scroll(0, mode='absolute')

Review comment:
       Sorry for the misleading.
   
   First of all server-side cursor doesn't support `relative` mode by default. It should be initialized like `conn.cursor(name='...', scrollable=True)`. But absolute mode could be used.
   
   But both scrolls cause a reset of `.description` attribute. So, unfortunately, they could not be used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy commented on a change in pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on a change in pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#discussion_r799663382



##########
File path: airflow/providers/google/cloud/transfers/postgres_to_gcs.py
##########
@@ -41,26 +41,22 @@ class _PostgresServerSideCursorDecorator:
 
     def __init__(self, cursor):
         self.cursor = cursor
-        self.rows = []
         self.initialized = False
 
     def __iter__(self):
         return self
 
     def __next__(self):
-        if self.rows:
-            return self.rows.pop()
-        else:
-            self.initialized = True
-            return next(self.cursor)
+        self.initialized = True
+        return next(self.cursor)
 
     @property
     def description(self):
         """Fetch first row to initialize cursor description when using server side cursor."""
         if not self.initialized:
-            element = self.cursor.fetchone()
-            self.rows.append(element)
             self.initialized = True
+            self.cursor.fetchone()
+            self.cursor.scroll(0, mode='absolute')

Review comment:
       Sorry for the misleading.
   
   First of all server-side cursor doesn't support `relative` mode by default. It should be initialized like `conn.cursor(name='...', scrollable=True). But absolute mode could be used.
   
   But both scrolls cause a reset of `.description` attribute. So, unfortunately, they could not be used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kazanzhy commented on a change in pull request #21307: Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True

Posted by GitBox <gi...@apache.org>.
kazanzhy commented on a change in pull request #21307:
URL: https://github.com/apache/airflow/pull/21307#discussion_r799663382



##########
File path: airflow/providers/google/cloud/transfers/postgres_to_gcs.py
##########
@@ -41,26 +41,22 @@ class _PostgresServerSideCursorDecorator:
 
     def __init__(self, cursor):
         self.cursor = cursor
-        self.rows = []
         self.initialized = False
 
     def __iter__(self):
         return self
 
     def __next__(self):
-        if self.rows:
-            return self.rows.pop()
-        else:
-            self.initialized = True
-            return next(self.cursor)
+        self.initialized = True
+        return next(self.cursor)
 
     @property
     def description(self):
         """Fetch first row to initialize cursor description when using server side cursor."""
         if not self.initialized:
-            element = self.cursor.fetchone()
-            self.rows.append(element)
             self.initialized = True
+            self.cursor.fetchone()
+            self.cursor.scroll(0, mode='absolute')

Review comment:
       Sorry for the misleading.
   
   First of all server-side cursor doesn't support `relative` mode by default. It should be initialized like `conn.cursor(name='...', scrollable=True)`. But absolute mode worked without initialization.
   
   But both scrolls cause a reset of `.description` attribute. So, unfortunately, they could not be used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org