You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Robin Edwards (Jira)" <ji...@apache.org> on 2020/01/15 11:56:00 UTC

[jira] [Updated] (AIRFLOW-6569) Broken sentry integration

     [ https://issues.apache.org/jira/browse/AIRFLOW-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robin Edwards updated AIRFLOW-6569:
-----------------------------------
    Description: 
I believe the new forking mechanism AIRFLOW-5931 has unintentionally broken the sentry integration.

Sentry relies on the atexit http://man7.org/linux/man-pages/man3/atexit.3.html signal to flush collected errors to their servers. Previously as the task was executed in a new process as opposed to forked this got invoked. However now os._exit() is called (which is semantically correct with child processes) https://docs.python.org/3/library/os.html#os._exit

Point os._exit is called in airflow:
https://github.com/apache/airflow/pull/6627/files#diff-736081a3535ff0b9e60ada2f51154ca4R84

Also related on sentry bug tracker: https://github.com/getsentry/sentry-python/issues/291

Unfortunately sentry doesn't provide (from what i can find) a public interface for flushing errors to their system. The return value of their init() functions returns an object containg a client but the property is `_client` so it would be wrong to rely on it.

I've side stepped this in two ways, you can disable the forking feature through patching CAN_FORK to False. But after seeing the performance improvement on my workers I opted to monkey patch the whole _exec_by_fork() and naughtily call sys.exit instead as a temporary fix.

I personally dont find the actual sentry integration in airflow useful as it doesn't collect errors from the rest of the system only tasks. I've been wiring it in through my log config module since before the integration was added however its still effected by the above change.

My personal vote (unless anyone has a better idea) would be to remove the integration completely document the way of setting it up through the logging class and providing a 'post_execute' hook of some form on the StandardTaskRunner where people can flush errors using what not.



  was:
I believe the new forking mechanism AIRFLOW-5931 has unintentionally broken the sentry integration.

Sentry relies on the atexit http://man7.org/linux/man-pages/man3/atexit.3.html signal to flush collected errors to their servers. Previously as the task was executed in a new process as opposed to forked this got invoked. However now os._exit() is called (which is semantically correct with child processes) https://docs.python.org/3/library/os.html#os._exit

Point os._exit is called in airflow:
https://github.com/apache/airflow/pull/6627/files#diff-736081a3535ff0b9e60ada2f51154ca4R84

Unfortunately sentry doesn't provide (from what i can find) a public interface for flushing errors to their system. The return value of their init() functions returns an object containg a client but the property is `_client` so it would be wrong to rely on it.

I've side stepped this in two ways, you can disable the forking feature through patching CAN_FORK to False. But after seeing the performance improvement on my workers I opted to monkey patch the whole _exec_by_fork() and naughtily call sys.exit instead as a temporary fix.

I personally dont find the actual sentry integration in airflow useful as it doesn't collect errors from the rest of the system only tasks. I've been wiring it in through my log config module since before the integration was added however its still effected by the above change.

My personal vote (unless anyone has a better idea) would be to remove the integration completely document the way of setting it up through the logging class and providing a 'post_execute' hook of some form on the StandardTaskRunner where people can flush errors using what not.



> Broken sentry integration
> -------------------------
>
>                 Key: AIRFLOW-6569
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6569
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: configuration, hooks
>    Affects Versions: 2.0.0, 1.10.7
>            Reporter: Robin Edwards
>            Priority: Minor
>
> I believe the new forking mechanism AIRFLOW-5931 has unintentionally broken the sentry integration.
> Sentry relies on the atexit http://man7.org/linux/man-pages/man3/atexit.3.html signal to flush collected errors to their servers. Previously as the task was executed in a new process as opposed to forked this got invoked. However now os._exit() is called (which is semantically correct with child processes) https://docs.python.org/3/library/os.html#os._exit
> Point os._exit is called in airflow:
> https://github.com/apache/airflow/pull/6627/files#diff-736081a3535ff0b9e60ada2f51154ca4R84
> Also related on sentry bug tracker: https://github.com/getsentry/sentry-python/issues/291
> Unfortunately sentry doesn't provide (from what i can find) a public interface for flushing errors to their system. The return value of their init() functions returns an object containg a client but the property is `_client` so it would be wrong to rely on it.
> I've side stepped this in two ways, you can disable the forking feature through patching CAN_FORK to False. But after seeing the performance improvement on my workers I opted to monkey patch the whole _exec_by_fork() and naughtily call sys.exit instead as a temporary fix.
> I personally dont find the actual sentry integration in airflow useful as it doesn't collect errors from the rest of the system only tasks. I've been wiring it in through my log config module since before the integration was added however its still effected by the above change.
> My personal vote (unless anyone has a better idea) would be to remove the integration completely document the way of setting it up through the logging class and providing a 'post_execute' hook of some form on the StandardTaskRunner where people can flush errors using what not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)