You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/20 01:23:02 UTC

[GitHub] [airflow] mik-laj opened a new pull request #13784: Fix two bugs in StackdriverTaskHandler

mik-laj opened a new pull request #13784:
URL: https://github.com/apache/airflow/pull/13784


   Unfortunately this organization is in a bit more serious condition and I found a few bugs that prevented the use of this integration.
   
   The commit https://github.com/apache/airflow/commit/ac943c9e18f75259d531dbda8c51e650f57faa4c#diff-e7f34f73940eb52d92bb991abedc1c963431c5373c12dff739c8fb7d03e93d3aR181  changed the output type for the `read` method.  Unfortunately, this change did not update this handler as well, so the attempt to read the entries was unsuccessful.
   
   The flush method was missing, with the result that at times one or two of the last entries were not saved. It is worth adding that the official implementation of Stackdriver Logging Handler does not have this method either. https://github.com/googleapis/python-logging/blob/master/google/cloud/logging_v2/handlers/handlers.py
   
   
   I also changed the parameters for calling the `entries.list` method, which I have the impression that they work better. They follow the defaults for `gcloud`.
   
   Close: https://github.com/apache/airflow/issues/13494
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13784: Fix two bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13784:
URL: https://github.com/apache/airflow/pull/13784#issuecomment-763280007


   [The Workflow run](https://github.com/apache/airflow/actions/runs/497496693) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13784: Fix four bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13784:
URL: https://github.com/apache/airflow/pull/13784#discussion_r568350504



##########
File path: airflow/providers/google/cloud/log/stackdriver_task_handler.py
##########
@@ -188,7 +193,7 @@ def read(
         if next_page_token:
             new_metadata['next_page_token'] = next_page_token
 
-        return [messages], [new_metadata]
+        return [((self.task_instance_hostname, messages),)], [new_metadata]

Review comment:
       ``Tuple[List[Tuple[Tuple[str, str]]], List[Dict[str, str]]]``




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13784: Fix four bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13784:
URL: https://github.com/apache/airflow/pull/13784#discussion_r568350504



##########
File path: airflow/providers/google/cloud/log/stackdriver_task_handler.py
##########
@@ -188,7 +193,7 @@ def read(
         if next_page_token:
             new_metadata['next_page_token'] = next_page_token
 
-        return [messages], [new_metadata]
+        return [((self.task_instance_hostname, messages),)], [new_metadata]

Review comment:
       ``Tuple[List[Tuple[Tuple[str, str]]], List[Dict[str, str]]]``

##########
File path: tests/cli/commands/test_info_command.py
##########
@@ -129,6 +130,8 @@ def test_should_read_logging_configuration(self):
             assert "stackdriver" in text
 
     def tearDown(self) -> None:
+        for handler_ref in logging._handlerList[:]:

Review comment:
       In this test, we used the StackdriverTaskHandler which tries to connect to GCP in the `close()` method. To avoid this, I delete the handlers manually without calling this method. Similar to: https://github.com/apache/airflow/pull/13784/files#r561283952




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13784: Fix two bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13784:
URL: https://github.com/apache/airflow/pull/13784#discussion_r561283952



##########
File path: tests/providers/google/cloud/log/test_stackdriver_task_handler.py
##########
@@ -35,10 +35,21 @@ def _create_list_response(messages, token):
     return mock.MagicMock(pages=(n for n in [page]), next_page_token=token)
 
 
+def _remove_stackdriver_handlers():

Review comment:
       This is another small fix. An error is generated when exiting the process, but it did not cause any errors, but only noise in the log.
   ```
   ========================================== 9 failed, 8 passed, 3 skipped, 7 errors in 12.91s ==========================================
   [2021-01-20 19:52:29,766] {_metadata.py:104} WARNING - Compute Engine Metadata server unavailable onattempt 1 of 3. Reason: timed out
   [2021-01-20 19:52:29,769] {_metadata.py:104} WARNING - Compute Engine Metadata server unavailable onattempt 2 of 3. Reason: [Errno 111] Connection refused
   [2021-01-20 19:52:29,774] {_metadata.py:104} WARNING - Compute Engine Metadata server unavailable onattempt 3 of 3. Reason: [Errno 111] Connection refused
   [2021-01-20 19:52:29,774] {_default.py:246} WARNING - Authentication failed using Compute Engine authentication due to unavailable metadata server.
   Error in atexit._run_exitfuncs:
   Traceback (most recent call last):
     File "/usr/local/lib/python3.6/logging/__init__.py", line 1946, in shutdown
       h.close()
     File "/opt/airflow/airflow/providers/google/cloud/log/stackdriver_task_handler.py", line 345, in close
       self._transport.flush()
     File "/usr/local/lib/python3.6/site-packages/cached_property.py", line 36, in __get__
       value = obj.__dict__[self.func.__name__] = self.func(obj)
     File "/opt/airflow/airflow/providers/google/cloud/log/stackdriver_task_handler.py", line 120, in _transport
       return self.transport_type(self._client, self.name)
     File "/usr/local/lib/python3.6/site-packages/cached_property.py", line 36, in __get__
       value = obj.__dict__[self.func.__name__] = self.func(obj)
     File "/opt/airflow/airflow/providers/google/cloud/log/stackdriver_task_handler.py", line 108, in _client
       key_path=self.gcp_key_path, scopes=self.scopes, disable_logging=True
     File "/opt/airflow/airflow/providers/google/cloud/utils/credentials_provider.py", line 309, in get_credentials_and_project_id
       return _CredentialProvider(*args, **kwargs).get_credentials_and_project()
     File "/opt/airflow/airflow/providers/google/cloud/utils/credentials_provider.py", line 242, in get_credentials_and_project
       credentials, project_id = self._get_credentials_using_adc()
     File "/opt/airflow/airflow/providers/google/cloud/utils/credentials_provider.py", line 295, in _get_credentials_using_adc
       credentials, project_id = google.auth.default(scopes=self.scopes)
     File "/usr/local/lib/python3.6/site-packages/google/auth/_default.py", line 356, in default
       raise exceptions.DefaultCredentialsError(_HELP_MESSAGE)
   google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #13784: Fix four bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #13784:
URL: https://github.com/apache/airflow/pull/13784


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13784: Fix two bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13784:
URL: https://github.com/apache/airflow/pull/13784#discussion_r561285084



##########
File path: airflow/providers/google/cloud/log/stackdriver_task_handler.py
##########
@@ -252,6 +256,8 @@ def _read_logs(
                     log_filter=log_filter, page_token=next_page_token
                 )
                 messages.append(new_messages)
+                if not messages:

Review comment:
       Stackdriver sometimes falls into an endless loop of blank pages.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13784: Fix two bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13784:
URL: https://github.com/apache/airflow/pull/13784#issuecomment-763828799


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #13784: Fix four bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #13784:
URL: https://github.com/apache/airflow/pull/13784


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13784: Fix two bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13784:
URL: https://github.com/apache/airflow/pull/13784#discussion_r561283952



##########
File path: tests/providers/google/cloud/log/test_stackdriver_task_handler.py
##########
@@ -35,10 +35,21 @@ def _create_list_response(messages, token):
     return mock.MagicMock(pages=(n for n in [page]), next_page_token=token)
 
 
+def _remove_stackdriver_handlers():

Review comment:
       This is another small fix. An error is generated when exiting the process, but it did not cause any errors, but only noise in the log.
   ```
   ========================================== 9 failed, 8 passed, 3 skipped, 7 errors in 12.91s ==========================================
   [2021-01-20 19:52:29,766] {_metadata.py:104} WARNING - Compute Engine Metadata server unavailable onattempt 1 of 3. Reason: timed out
   [2021-01-20 19:52:29,769] {_metadata.py:104} WARNING - Compute Engine Metadata server unavailable onattempt 2 of 3. Reason: [Errno 111] Connection refused
   [2021-01-20 19:52:29,774] {_metadata.py:104} WARNING - Compute Engine Metadata server unavailable onattempt 3 of 3. Reason: [Errno 111] Connection refused
   [2021-01-20 19:52:29,774] {_default.py:246} WARNING - Authentication failed using Compute Engine authentication due to unavailable metadata server.
   Error in atexit._run_exitfuncs:
   Traceback (most recent call last):
     File "/usr/local/lib/python3.6/logging/__init__.py", line 1946, in shutdown
       h.close()
     File "/opt/airflow/airflow/providers/google/cloud/log/stackdriver_task_handler.py", line 345, in close
       self._transport.flush()
     File "/usr/local/lib/python3.6/site-packages/cached_property.py", line 36, in __get__
       value = obj.__dict__[self.func.__name__] = self.func(obj)
     File "/opt/airflow/airflow/providers/google/cloud/log/stackdriver_task_handler.py", line 120, in _transport
       return self.transport_type(self._client, self.name)
     File "/usr/local/lib/python3.6/site-packages/cached_property.py", line 36, in __get__
       value = obj.__dict__[self.func.__name__] = self.func(obj)
     File "/opt/airflow/airflow/providers/google/cloud/log/stackdriver_task_handler.py", line 108, in _client
       key_path=self.gcp_key_path, scopes=self.scopes, disable_logging=True
     File "/opt/airflow/airflow/providers/google/cloud/utils/credentials_provider.py", line 309, in get_credentials_and_project_id
       return _CredentialProvider(*args, **kwargs).get_credentials_and_project()
     File "/opt/airflow/airflow/providers/google/cloud/utils/credentials_provider.py", line 242, in get_credentials_and_project
       credentials, project_id = self._get_credentials_using_adc()
     File "/opt/airflow/airflow/providers/google/cloud/utils/credentials_provider.py", line 295, in _get_credentials_using_adc
       credentials, project_id = google.auth.default(scopes=self.scopes)
     File "/usr/local/lib/python3.6/site-packages/google/auth/_default.py", line 356, in default
       raise exceptions.DefaultCredentialsError(_HELP_MESSAGE)
   google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
   ```

##########
File path: airflow/providers/google/cloud/log/stackdriver_task_handler.py
##########
@@ -252,6 +256,8 @@ def _read_logs(
                     log_filter=log_filter, page_token=next_page_token
                 )
                 messages.append(new_messages)
+                if not messages:

Review comment:
       Stackdriver sometimes falls into an endless loop of blank pages.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13784: Fix four bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13784:
URL: https://github.com/apache/airflow/pull/13784#discussion_r563150594



##########
File path: airflow/providers/google/cloud/log/stackdriver_task_handler.py
##########
@@ -188,7 +193,7 @@ def read(
         if next_page_token:
             new_metadata['next_page_token'] = next_page_token
 
-        return [messages], [new_metadata]
+        return [((self.task_instance_hostname, messages),)], [new_metadata]

Review comment:
       Indeed... `List[Tuple[Tuple[str, List]]]`  😄 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13784: Fix four bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13784:
URL: https://github.com/apache/airflow/pull/13784#discussion_r568513512



##########
File path: tests/cli/commands/test_info_command.py
##########
@@ -129,6 +130,8 @@ def test_should_read_logging_configuration(self):
             assert "stackdriver" in text
 
     def tearDown(self) -> None:
+        for handler_ref in logging._handlerList[:]:

Review comment:
       In this test, we used the StackdriverTaskHandler which tries to connect to GCP in the `close()` method. To avoid this, I delete the handlers manually without calling this method. Similar to: https://github.com/apache/airflow/pull/13784/files#r561283952




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13784: Fix two bugs in StackdriverTaskHandler

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13784:
URL: https://github.com/apache/airflow/pull/13784#discussion_r561164270



##########
File path: airflow/providers/google/cloud/log/stackdriver_task_handler.py
##########
@@ -188,7 +193,7 @@ def read(
         if next_page_token:
             new_metadata['next_page_token'] = next_page_token
 
-        return [messages], [new_metadata]
+        return [((self.task_instance_hostname, messages),)], [new_metadata]

Review comment:
       Indeed... interesting way of reurning data :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org