You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kousuke Saruta (Jira)" <ji...@apache.org> on 2020/01/10 11:06:00 UTC
[jira] [Updated] (AIRFLOW-6529) Serialization error occurs when the
scheduler tries to run on macOS.
[ https://issues.apache.org/jira/browse/AIRFLOW-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kousuke Saruta updated AIRFLOW-6529:
------------------------------------
Description:
When we try to run the scheduler on macOS, we will get a serialization error like as follows.
{code}
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2020-01-10 19:54:41,974] {executor_loader.py:59} INFO - Using executor SequentialExecutor
[2020-01-10 19:54:41,983] {scheduler_job.py:1462} INFO - Starting the scheduler
[2020-01-10 19:54:41,984] {scheduler_job.py:1469} INFO - Processing each file at most -1 times
[2020-01-10 19:54:41,984] {scheduler_job.py:1472} INFO - Searching for files in /Users/sarutak/airflow/dags
[2020-01-10 19:54:42,025] {scheduler_job.py:1474} INFO - There are 27 files in /Users/sarutak/airflow/dags
[2020-01-10 19:54:42,025] {scheduler_job.py:1527} INFO - Resetting orphaned tasks for active dag runs
[2020-01-10 19:54:42,059] {scheduler_job.py:1500} ERROR - Exception when executing execute_helper
Traceback (most recent call last):
File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1498, in _execute
self._execute_helper()
File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1531, in _execute_helper
self.processor_agent.start()
File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 348, in start
self._process.start()
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'SchedulerJob._execute.<locals>.processor_factory'
{code}
The reason is scheduler try to run subprocesses using multiprocessing with spawn mode and the mode tries to pickle objects. In this case, `processor_factory` inner method is tried to be pickled.
Actually, as of Python 3.8, spawn mode is the default mode in macOS.
was:
When we try to run the scheduler on macOS, we will get a serialization error like as follows.
{code}
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2020-01-10 19:54:41,974] {executor_loader.py:59} INFO - Using executor SequentialExecutor
[2020-01-10 19:54:41,983] {scheduler_job.py:1462} INFO - Starting the scheduler
[2020-01-10 19:54:41,984] {scheduler_job.py:1469} INFO - Processing each file at most -1 times
[2020-01-10 19:54:41,984] {scheduler_job.py:1472} INFO - Searching for files in /Users/sarutak/airflow/dags
[2020-01-10 19:54:42,025] {scheduler_job.py:1474} INFO - There are 27 files in /Users/sarutak/airflow/dags
[2020-01-10 19:54:42,025] {scheduler_job.py:1527} INFO - Resetting orphaned tasks for active dag runs
[2020-01-10 19:54:42,059] {scheduler_job.py:1500} ERROR - Exception when executing execute_helper
Traceback (most recent call last):
File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1498, in _execute
self._execute_helper()
File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1531, in _execute_helper
self.processor_agent.start()
File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 348, in start
self._process.start()
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/opt/python/3.8.1/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'SchedulerJob._execute.<locals>.processor_factory'
{code}
The reason is scheduler try to run subprocesses using multiprocessing with spawn mode.
Actually, as of Python 3.8, spawn mode is the default mode in macOS.
> Serialization error occurs when the scheduler tries to run on macOS.
> --------------------------------------------------------------------
>
> Key: AIRFLOW-6529
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6529
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 1.10.8
> Environment: macOS
> Python 3.8
> multiprocessing with spawn mode
> Reporter: Kousuke Saruta
> Assignee: Kousuke Saruta
> Priority: Major
>
> When we try to run the scheduler on macOS, we will get a serialization error like as follows.
> {code}
> ____________ _____________
> ____ |__( )_________ __/__ /________ __
> ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
> ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
> _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
> [2020-01-10 19:54:41,974] {executor_loader.py:59} INFO - Using executor SequentialExecutor
> [2020-01-10 19:54:41,983] {scheduler_job.py:1462} INFO - Starting the scheduler
> [2020-01-10 19:54:41,984] {scheduler_job.py:1469} INFO - Processing each file at most -1 times
> [2020-01-10 19:54:41,984] {scheduler_job.py:1472} INFO - Searching for files in /Users/sarutak/airflow/dags
> [2020-01-10 19:54:42,025] {scheduler_job.py:1474} INFO - There are 27 files in /Users/sarutak/airflow/dags
> [2020-01-10 19:54:42,025] {scheduler_job.py:1527} INFO - Resetting orphaned tasks for active dag runs
> [2020-01-10 19:54:42,059] {scheduler_job.py:1500} ERROR - Exception when executing execute_helper
> Traceback (most recent call last):
> File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1498, in _execute
> self._execute_helper()
> File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1531, in _execute_helper
> self.processor_agent.start()
> File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 348, in start
> self._process.start()
> File "/opt/python/3.8.1/lib/python3.8/multiprocessing/process.py", line 121, in start
> self._popen = self._Popen(self)
> File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
> return _default_context.get_context().Process._Popen(process_obj)
> File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
> return Popen(process_obj)
> File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
> super().__init__(process_obj)
> File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
> self._launch(process_obj)
> File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
> reduction.dump(process_obj, fp)
> File "/opt/python/3.8.1/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
> ForkingPickler(file, protocol).dump(obj)
> AttributeError: Can't pickle local object 'SchedulerJob._execute.<locals>.processor_factory'
> {code}
> The reason is scheduler try to run subprocesses using multiprocessing with spawn mode and the mode tries to pickle objects. In this case, `processor_factory` inner method is tried to be pickled.
> Actually, as of Python 3.8, spawn mode is the default mode in macOS.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)