You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/25 04:07:31 UTC

[GitHub] [airflow] ericpollmann opened a new issue #21082: Scheduler hangs, DAG serialization error under high scheduler load

ericpollmann opened a new issue #21082:
URL: https://github.com/apache/airflow/issues/21082


   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### What happened
   
   DAGs running on Airflow 2.2.3 run without issue in normal conditions, but when scheduler gets heavily loaded, it hangs with the following error:
   
   <img width="1345" alt="Screen Shot 2022-01-24 at 7 57 43 PM" src="https://user-images.githubusercontent.com/7079390/150908258-2a60b98f-8d35-4a71-9d96-06d8efa5b3f7.png">
   
   In this condition the scheduler hung and did not schedule or run any more tasks, causing scheduled pipelines to back up for hours until detected by monitoring and resolved by human intervention (restarting the scheduler seems to work).
   
   We were able to reproduce the error locally and dump op.params - it was equal to the config that the DAG run was triggered with - a standard python dictionary with string keys and values.
   
   ### What you expected to happen
   
   No DAG serialization error and scheduler does not hang.
   
   ### How to reproduce
   
   Unfortunately this was challenging: there were no errors or hangs during low load or normal conditions, they only appeared when the scheduler was very heavily loaded (i.e. many thousands of DAG runs per hour)
   
   This was reproducible under Kubernetes (Debian GNU/Linux 10 (buster)) and locally (MacOS)
   
   ### Operating System
   
   Debian GNU/Linux 10 (buster)
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Single scheduler instance, 4-8 cores the issue was easily reproducible with our load (thousands of DAG runs per hour) but with increased scheduler resources (3 instances of similar size) the issue was not easy to reproduct (error only flickers in briefly, doesn't hang).
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #21082: Scheduler hangs, DAG serialization error under high scheduler load

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #21082:
URL: https://github.com/apache/airflow/issues/21082#issuecomment-1020790369


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sergbelyakov commented on issue #21082: Scheduler hangs, DAG serialization error under high scheduler load (patch)

Posted by GitBox <gi...@apache.org>.
sergbelyakov commented on issue #21082:
URL: https://github.com/apache/airflow/issues/21082#issuecomment-1040054581


   I also experienced this issue after switching from CeleryExecutor to CeleryKubernetesExecutor. Scheduler stopped hanging after switching back to CeleryExecutor. Looks like an issue with Kubernetes Executor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ericpollmann edited a comment on issue #21082: Scheduler hangs, DAG serialization error under high scheduler load

Posted by GitBox <gi...@apache.org>.
ericpollmann edited a comment on issue #21082:
URL: https://github.com/apache/airflow/issues/21082#issuecomment-1020795260


   This makes the issue go away but seems to be addressing the symptom rather than the cause - not sure how an unprocessed python dict is reaching this code that assumes params has been processed into a specific format.
   
   ```
   diff --git a/airflow/serialization/serialized_objects.py b/airflow/serialization/serialized_objects.py
   index 63820ffdf..f165b454b 100644
   --- a/airflow/serialization/serialized_objects.py
   +++ b/airflow/serialization/serialized_objects.py
   @@ -466,7 +466,9 @@ class BaseSerialization:
            """Serialize Params dict for a DAG/Task"""
            serialized_params = {}
            for k, v in params.items():
   -            # TODO: As of now, we would allow serialization of params which are of type Param only.
   +            # Old style params, convert it
   +            if not isinstance(v, Param):
   +                v = Param(v)
                try:
                    class_identity = f"{v.__module__}.{v.__class__.__name__}"
                except AttributeError:
   ```
   
   This may potentially be happening in some place like this?
   https://github.com/apache/airflow/blob/602abe8394fafe7de54df7e73af56de848cdf617/airflow/models/taskinstance.py#L1988


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ericpollmann commented on issue #21082: Scheduler hangs, DAG serialization error under high scheduler load

Posted by GitBox <gi...@apache.org>.
ericpollmann commented on issue #21082:
URL: https://github.com/apache/airflow/issues/21082#issuecomment-1020795260


   This makes the issue go away but seems to be addressing the symptom rather than the cause - not sure how an unprocessed python dict is reaching this code that assumes params has been processed into a specific format.
   
   diff --git a/airflow/serialization/serialized_objects.py b/airflow/serialization/serialized_objects.py
   index 63820ffdf..f165b454b 100644
   --- a/airflow/serialization/serialized_objects.py
   +++ b/airflow/serialization/serialized_objects.py
   @@ -466,7 +466,9 @@ class BaseSerialization:
            """Serialize Params dict for a DAG/Task"""
            serialized_params = {}
            for k, v in params.items():
   -            # TODO: As of now, we would allow serialization of params which are of type Param only.
   +            # Old style params, convert it
   +            if not isinstance(v, Param):
   +                v = Param(v)
                try:
                    class_identity = f"{v.__module__}.{v.__class__.__name__}"
                except AttributeError:
   
   This may potentially be happening in some place like this?
   https://github.com/apache/airflow/blob/602abe8394fafe7de54df7e73af56de848cdf617/airflow/models/taskinstance.py#L1988
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sergbelyakov commented on issue #21082: Scheduler hangs, DAG serialization error under high scheduler load (patch)

Posted by GitBox <gi...@apache.org>.
sergbelyakov commented on issue #21082:
URL: https://github.com/apache/airflow/issues/21082#issuecomment-1040054581


   I also experienced this issue after switching from CeleryExecutor to CeleryKubernetesExecutor. Scheduler stopped hanging after switching back to CeleryExecutor. Looks like an issue with Kubernetes Executor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ericpollmann edited a comment on issue #21082: Scheduler hangs, DAG serialization error under high scheduler load

Posted by GitBox <gi...@apache.org>.
ericpollmann edited a comment on issue #21082:
URL: https://github.com/apache/airflow/issues/21082#issuecomment-1020795260


   This makes the issue go away but seems to be addressing the symptom rather than the cause - not sure how an unprocessed python dict is reaching this code that assumes params has been processed into a specific format.
   
   ```
   diff --git a/airflow/serialization/serialized_objects.py b/airflow/serialization/serialized_objects.py
   index 63820ffdf..f165b454b 100644
   --- a/airflow/serialization/serialized_objects.py
   +++ b/airflow/serialization/serialized_objects.py
   @@ -466,7 +466,9 @@ class BaseSerialization:
            """Serialize Params dict for a DAG/Task"""
            serialized_params = {}
            for k, v in params.items():
   -            # TODO: As of now, we would allow serialization of params which are of type Param only.
   +            # Old style params, convert it
   +            if not isinstance(v, Param):
   +                v = Param(v)
                try:
                    class_identity = f"{v.__module__}.{v.__class__.__name__}"
                except AttributeError:
   ```
   
   This may potentially be happening in some place like this?
   https://github.com/apache/airflow/blob/602abe8394fafe7de54df7e73af56de848cdf617/airflow/models/taskinstance.py#L1988


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ericpollmann commented on issue #21082: Scheduler hangs, DAG serialization error under high scheduler load

Posted by GitBox <gi...@apache.org>.
ericpollmann commented on issue #21082:
URL: https://github.com/apache/airflow/issues/21082#issuecomment-1020791338






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ericpollmann commented on issue #21082: Scheduler hangs, DAG serialization error under high scheduler load

Posted by GitBox <gi...@apache.org>.
ericpollmann commented on issue #21082:
URL: https://github.com/apache/airflow/issues/21082#issuecomment-1020791338


   We have looked at https://github.com/apache/airflow/issues/20636 - and there is no op.params added to the operators in our DAGs.
   Also aware of https://github.com/apache/airflow/commit/a0cad0725de2c56181f1d9a0b875652ba6ab6361 which doesn't fix this issue, only changes the error message.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #21082: Scheduler hangs, DAG serialization error under high scheduler load

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #21082:
URL: https://github.com/apache/airflow/issues/21082#issuecomment-1020790369


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org