You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/12 18:33:47 UTC

[GitHub] [airflow] hubert-pietron opened a new pull request #21540: Fix logging JDBC SQL error when task fails

hubert-pietron opened a new pull request #21540:
URL: https://github.com/apache/airflow/pull/21540


   closes: #16295 
   closes: #18482
   
   JDBC operator could not log errors when task failed. For specific type of error (jpype.java.sql.SQLException, jaydebeapi.DatabaseError, jaydebeapi.InterfaceError) this line https://github.com/apache/airflow/blob/0a2d0d1ecbb7a72677f96bc17117799ab40853e0/airflow/models/taskinstance.py#L1733
   would produce AttributeError
   ```
   [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING - Traceback (most recent call last):
   [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING -   File "/opt/***/***/models/taskinstance.py", line 1726, in handle_failure
       self.log.error("Task failed with exception", exc_info=error.__traceback__)
   [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING -   File "/usr/local/lib/python3.8/logging/__init__.py", line 1475, in error
       self._log(ERROR, msg, args, **kwargs)
   [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING -   File "/usr/local/lib/python3.8/logging/__init__.py", line 1589, in _log
       self.handle(record)
   [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING -   File "/usr/local/lib/python3.8/logging/__init__.py", line 1599, in handle
       self.callHandlers(record)
   [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING -   File "/usr/local/lib/python3.8/logging/__init__.py", line 1661, in callHandlers
       hdlr.handle(record)
   [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING -   File "/usr/local/lib/python3.8/logging/__init__.py", line 950, in handle
       rv = self.filter(record)
   [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING -   File "/usr/local/lib/python3.8/logging/__init__.py", line 811, in filter
       result = f.filter(record)
   [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING -   File "/opt/***/***/utils/log/secrets_masker.py", line 169, in filter
       self._redact_exception_with_context(exc)
   [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING -   File "/opt/***/***/utils/log/secrets_masker.py", line 150, in _redact_exception_with_context
       exception.args = (self.redact(v) for v in exception.args)
   [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING - AttributeError: can't set attribute
   ``` 
   Catching that error allow to see meaningful log when JDBC task fails.
   
   example DAG
   ```
   from datetime import datetime, timedelta
   from airflow import DAG
   from airflow.providers.jdbc.operators.jdbc import JdbcOperator
   
   with DAG(
       dag_id='jdbc_with_error',
       schedule_interval='0 0 * * *',
       start_date=datetime(2021, 1, 1),
       dagrun_timeout=timedelta(minutes=60),
       tags=['example'],
       catchup=False,
   ) as dag:
   
       start = JdbcOperator(
           task_id='test_task',
           sql='select 1 x y z',
           jdbc_conn_id='my_jdbc_connection',
           autocommit=True,
       )
   ```
   log after change
   ```
   *** Reading local file: /root/airflow/logs/jdbc_with_error/test_task/2022-02-12T15:38:46.534291+00:00/1.log
   [2022-02-12, 15:38:47 UTC] {taskinstance.py:1052} INFO - Dependencies all met for <TaskInstance: jdbc_with_error.test_task manual__2022-02-12T15:38:46.534291+00:00 [queued]>
   [2022-02-12, 15:38:47 UTC] {taskinstance.py:1052} INFO - Dependencies all met for <TaskInstance: jdbc_with_error.test_task manual__2022-02-12T15:38:46.534291+00:00 [queued]>
   [2022-02-12, 15:38:47 UTC] {taskinstance.py:1249} INFO - 
   --------------------------------------------------------------------------------
   [2022-02-12, 15:38:47 UTC] {taskinstance.py:1250} INFO - Starting attempt 1 of 1
   [2022-02-12, 15:38:47 UTC] {taskinstance.py:1251} INFO - 
   --------------------------------------------------------------------------------
   [2022-02-12, 15:38:47 UTC] {taskinstance.py:1270} INFO - Executing <Task(JdbcOperator): test_task> on 2022-02-12 15:38:46.534291+00:00
   [2022-02-12, 15:38:47 UTC] {standard_task_runner.py:52} INFO - Started process 7755 to run task
   [2022-02-12, 15:38:47 UTC] {standard_task_runner.py:76} INFO - Running: ['***', 'tasks', 'run', 'jdbc_with_error', 'test_task', 'manual__2022-02-12T15:38:46.534291+00:00', '--job-id', '245', '--raw', '--subdir', 'DAGS_FOLDER/example_dag.py', '--cfg-path', '/tmp/tmp7ryzrpgx', '--error-file', '/tmp/tmp0u3p6q8m']
   [2022-02-12, 15:38:47 UTC] {standard_task_runner.py:77} INFO - Job 245: Subtask test_task
   [2022-02-12, 15:38:47 UTC] {logging_mixin.py:115} INFO - Running <TaskInstance: jdbc_with_error.test_task manual__2022-02-12T15:38:46.534291+00:00 [running]> on host 464ba29d61dc
   [2022-02-12, 15:38:47 UTC] {logging_mixin.py:115} WARNING - /opt/***/***/models/taskinstance.py:830 DeprecationWarning: Passing 'execution_date' to 'XCom.clear()' is deprecated. Use 'run_id' instead.
   [2022-02-12, 15:38:47 UTC] {logging_mixin.py:115} WARNING - /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/coercions.py:518 SAWarning: Coercing Subquery object into a select() for use in IN(); please pass a select() construct explicitly
   [2022-02-12, 15:38:48 UTC] {taskinstance.py:1442} INFO - Exporting the following env vars:
   AIRFLOW_CTX_DAG_OWNER=***
   AIRFLOW_CTX_DAG_ID=jdbc_with_error
   AIRFLOW_CTX_TASK_ID=test_task
   AIRFLOW_CTX_EXECUTION_DATE=2022-02-12T15:38:46.534291+00:00
   AIRFLOW_CTX_DAG_RUN_ID=manual__2022-02-12T15:38:46.534291+00:00
   [2022-02-12, 15:38:48 UTC] {jdbc.py:68} INFO - Executing: select 1 x y z
   [2022-02-12, 15:38:48 UTC] {base.py:70} INFO - Using connection to: id: my_jdbc_connection. Host: jdbc:sqlite:elo, Port: None, Schema: , Login: , Password: None, extra: {'extra__jdbc__drv_clsname': 'org.sqlite.JDBC', 'extra__jdbc__drv_path': '/opt/***/sqlite-jdbc-3.36.0.3.jar'}
   [2022-02-12, 15:38:48 UTC] {dbapi.py:213} INFO - Running statement: select 1 x y z, parameters: None
   [2022-02-12, 15:38:48 UTC] {taskinstance.py:1738} ERROR - org.sqlite.SQLiteException: [SQLITE_ERROR] SQL error or missing database (near "y": syntax error)
   [2022-02-12, 15:38:48 UTC] {taskinstance.py:1278} INFO - Marking task as FAILED. dag_id=jdbc_with_error, task_id=test_task, execution_date=20220212T153846, start_date=20220212T153847, end_date=20220212T153848
   [2022-02-12, 15:38:48 UTC] {local_task_job.py:162} INFO - Task exited with return code 1
   [2022-02-12, 15:38:48 UTC] {local_task_job.py:279} INFO - 0 downstream tasks scheduled from follow-on schedule check
   [2022-02-12, 15:38:48 UTC] {logging_mixin.py:115} WARNING - /opt/***/***/models/dag.py:1068 SADeprecationWarning: Query.value() is deprecated and will be removed in a future release.  Please use Query.with_entities() in combination with Query.scalar() (deprecated since: 1.4)
   ```
   
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #21540: Fix logging JDBC SQL error when task fails

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #21540:
URL: https://github.com/apache/airflow/pull/21540#issuecomment-1040863702


   > It looks like the exception that's happening during the handling of the first exception is actually Airflow filtering code (the secret masker specifically):
   > 
   > ```
   >     exception.args = (self.redact(v) for v in exception.args)
   > [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING - AttributeError: can't set attribute
   > ```
   > 
   > I think this code should be patched to handle whatever edge case is occurring here, rather than the caller just catching the exception and logging it another way to work around it.
   > 
   > It looks like @potiuk was the last to touch the `_redact_exception_with_context` method, perhaps he has more context.
   
   Yeah. I refectored a code that had literally this comment:
   
   >   I'm not sure if this is a good idea!
   
   Apparently it was not.
   
   I think the right approach is to make the same try/except but in the place where redact is mdone @hubert-pietron
   
   Just find:
   ```
           exception.args = (self.redact(v) for v in exception.args)
   ```
   
   and surround it with the:
   
   ```
   try:
           exception.args = (self.redact(v) for v in exception.args)
   except AttributeError:
          pass
   ```
   
   The problem is that in some exceptions the args are "read property" and you cannot replace them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21540: Fix logging JDBC SQL error when task fails

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21540:
URL: https://github.com/apache/airflow/pull/21540#discussion_r810687284



##########
File path: airflow/utils/log/secrets_masker.py
##########
@@ -147,7 +147,10 @@ def _record_attrs_to_ignore(self) -> Iterable[str]:
         return frozenset(record.__dict__).difference({'msg', 'args'})
 
     def _redact_exception_with_context(self, exception):
-        exception.args = (self.redact(v) for v in exception.args)
+        try:
+            exception.args = (self.redact(v) for v in exception.args)
+        except AttributeError:

Review comment:
       Add a comment explaining why this is done?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #21540: Fix logging JDBC SQL error when task fails

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #21540:
URL: https://github.com/apache/airflow/pull/21540#issuecomment-1040863702


   > It looks like the exception that's happening during the handling of the first exception is actually Airflow filtering code (the secret masker specifically):
   > 
   > ```
   >     exception.args = (self.redact(v) for v in exception.args)
   > [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING - AttributeError: can't set attribute
   > ```
   > 
   > I think this code should be patched to handle whatever edge case is occurring here, rather than the caller just catching the exception and logging it another way to work around it.
   > 
   > It looks like @potiuk was the last to touch the `_redact_exception_with_context` method, perhaps he has more context.
   
   Yeah. I refectored a code that had literally this comment:
   
   >    # I'm not sure if this is a good idea!
   
   Apparently it was not.
   
   I think the right approach is to make the same try/except but in the place where redact is mdone @hubert-pietron
   
   Just find:
   ```
           exception.args = (self.redact(v) for v in exception.args)
   ```
   
   and surround it with the:
   
   ```
   try:
           exception.args = (self.redact(v) for v in exception.args)
   except AttributeError:
          pass
   ```
   
   The problem is that in some exceptions the args are "read property" and you cannot replace them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21540: Fix logging JDBC SQL error when task fails

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21540:
URL: https://github.com/apache/airflow/pull/21540#discussion_r810807374



##########
File path: airflow/utils/log/secrets_masker.py
##########
@@ -147,7 +147,12 @@ def _record_attrs_to_ignore(self) -> Iterable[str]:
         return frozenset(record.__dict__).difference({'msg', 'args'})
 
     def _redact_exception_with_context(self, exception):
-        exception.args = (self.redact(v) for v in exception.args)
+        # In case, when exception has "read only" properties, we need
+        # to catch an AttributeError to log properly

Review comment:
       ```suggestion
           # An exception may not be modifiable (e.g. declared by an extension module
           # such as JDBC).
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21540: Fix logging JDBC SQL error when task fails

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21540:
URL: https://github.com/apache/airflow/pull/21540#discussion_r810807374



##########
File path: airflow/utils/log/secrets_masker.py
##########
@@ -147,7 +147,12 @@ def _record_attrs_to_ignore(self) -> Iterable[str]:
         return frozenset(record.__dict__).difference({'msg', 'args'})
 
     def _redact_exception_with_context(self, exception):
-        exception.args = (self.redact(v) for v in exception.args)
+        # In case, when exception has "read only" properties, we need
+        # to catch an AttributeError to log properly

Review comment:
       ```suggestion
           # Exception class may not be modifiable (e.g. declared by an extension
           # module such as JDBC).
   ```

##########
File path: airflow/utils/log/secrets_masker.py
##########
@@ -147,7 +147,12 @@ def _record_attrs_to_ignore(self) -> Iterable[str]:
         return frozenset(record.__dict__).difference({'msg', 'args'})
 
     def _redact_exception_with_context(self, exception):
-        exception.args = (self.redact(v) for v in exception.args)
+        # In case, when exception has "read only" properties, we need
+        # to catch an AttributeError to log properly

Review comment:
       ```suggestion
           # Exception class may not be modifiable (e.g. declared by an
           # extension module such as JDBC).
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #21540: Fix logging JDBC SQL error when task fails

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #21540:
URL: https://github.com/apache/airflow/pull/21540#issuecomment-1046532248


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] hubert-pietron commented on pull request #21540: Fix logging JDBC SQL error when task fails

Posted by GitBox <gi...@apache.org>.
hubert-pietron commented on pull request #21540:
URL: https://github.com/apache/airflow/pull/21540#issuecomment-1043297794


   Thanks for response! I pushed changes like you suggested @potiuk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21540: Fix logging JDBC SQL error when task fails

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21540:
URL: https://github.com/apache/airflow/pull/21540#issuecomment-1040863702


   > It looks like the exception that's happening during the handling of the first exception is actually Airflow filtering code (the secret masker specifically):
   > 
   > ```
   >     exception.args = (self.redact(v) for v in exception.args)
   > [2022-02-12, 09:27:16 UTC] {logging_mixin.py:115} WARNING - AttributeError: can't set attribute
   > ```
   > 
   > I think this code should be patched to handle whatever edge case is occurring here, rather than the caller just catching the exception and logging it another way to work around it.
   > 
   > It looks like @potiuk was the last to touch the `_redact_exception_with_context` method, perhaps he has more context.
   
   Yeah. I refectored a code that had literally this comment:
   
   >  # I'm not sure if this is a good idea!
   
   Apparently it was not.
   
   I think the right approach is to make the same try/except but in the place where redact is mdone @hubert-pietron
   
   Just find:
   ```
           exception.args = (self.redact(v) for v in exception.args)
   ```
   
   and surround it with the:
   
   ```
   try:
           exception.args = (self.redact(v) for v in exception.args)
   except AttributeError:
          pass
   ```
   
   The problem is that in some exceptions the args are "read property" and you cannot replace them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #21540: Fix logging JDBC SQL error when task fails

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #21540:
URL: https://github.com/apache/airflow/pull/21540


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org