You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/01/10 11:02:00 UTC

[jira] [Commented] (AIRFLOW-6529) Serialization error occurs when the scheduler tries to run on macOS.

    [ https://issues.apache.org/jira/browse/AIRFLOW-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012711#comment-17012711 ] 

ASF GitHub Bot commented on AIRFLOW-6529:
-----------------------------------------

sarutak commented on pull request #7128: [AIRFLOW-6529] Serialization error occurs when the scheduler tries to run on macOS.
URL: https://github.com/apache/airflow/pull/7128
 
 
   When we try to run the scheduler on macOS, we will get a serialization error like as follows.
   ```
     ____________       _____________
    ____    |__( )_________  __/__  /________      __
   ____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
   ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
    _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   [2020-01-10 19:54:41,974] {executor_loader.py:59} INFO - Using executor SequentialExecutor
   [2020-01-10 19:54:41,983] {scheduler_job.py:1462} INFO - Starting the scheduler
   [2020-01-10 19:54:41,984] {scheduler_job.py:1469} INFO - Processing each file at most -1 times
   [2020-01-10 19:54:41,984] {scheduler_job.py:1472} INFO - Searching for files in /Users/sarutak/airflow/dags
   [2020-01-10 19:54:42,025] {scheduler_job.py:1474} INFO - There are 27 files in /Users/sarutak/airflow/dags
   [2020-01-10 19:54:42,025] {scheduler_job.py:1527} INFO - Resetting orphaned tasks for active dag runs
   [2020-01-10 19:54:42,059] {scheduler_job.py:1500} ERROR - Exception when executing execute_helper
   Traceback (most recent call last):
     File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1498, in _execute
       self._execute_helper()
     File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1531, in _execute_helper
       self.processor_agent.start()
     File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 348, in start
       self._process.start()
     File "/opt/python/3.8.1/lib/python3.8/multiprocessing/process.py", line 121, in start
       self._popen = self._Popen(self)
     File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
       return _default_context.get_context().Process._Popen(process_obj)
     File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
       return Popen(process_obj)
     File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
       super().__init__(process_obj)
     File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
       self._launch(process_obj)
     File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
       reduction.dump(process_obj, fp)
     File "/opt/python/3.8.1/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
       ForkingPickler(file, protocol).dump(obj)
   AttributeError: Can't pickle local object 'SchedulerJob._execute.<locals>.processor_factory'
   ```
   
   The reason is scheduler try to run subprocesses using multiprocessing with spawn mode.
   Actually, as of Python 3.8, spawn mode is the default mode in macOS.
   
   ---
   Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN = JIRA ID<sup>*</sup>
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   <sup>*</sup> For document-only changes commit message can start with `[AIRFLOW-XXXX]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Serialization error occurs when the scheduler tries to run on macOS.
> --------------------------------------------------------------------
>
>                 Key: AIRFLOW-6529
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6529
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.8
>         Environment: macOS
> Python 3.8
> multiprocessing with spawn mode
>            Reporter: Kousuke Saruta
>            Assignee: Kousuke Saruta
>            Priority: Major
>
> When we try to run the scheduler on macOS, we will get a serialization error like as follows.
> {code}
>   ____________       _____________
>  ____    |__( )_________  __/__  /________      __
> ____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
> ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
>  _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
> [2020-01-10 19:54:41,974] {executor_loader.py:59} INFO - Using executor SequentialExecutor
> [2020-01-10 19:54:41,983] {scheduler_job.py:1462} INFO - Starting the scheduler
> [2020-01-10 19:54:41,984] {scheduler_job.py:1469} INFO - Processing each file at most -1 times
> [2020-01-10 19:54:41,984] {scheduler_job.py:1472} INFO - Searching for files in /Users/sarutak/airflow/dags
> [2020-01-10 19:54:42,025] {scheduler_job.py:1474} INFO - There are 27 files in /Users/sarutak/airflow/dags
> [2020-01-10 19:54:42,025] {scheduler_job.py:1527} INFO - Resetting orphaned tasks for active dag runs
> [2020-01-10 19:54:42,059] {scheduler_job.py:1500} ERROR - Exception when executing execute_helper
> Traceback (most recent call last):
>   File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1498, in _execute
>     self._execute_helper()
>   File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1531, in _execute_helper
>     self.processor_agent.start()
>   File "/Users/sarutak/work/oss/airflow-env/master-python3.8.1/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 348, in start
>     self._process.start()
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/process.py", line 121, in start
>     self._popen = self._Popen(self)
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
>     return _default_context.get_context().Process._Popen(process_obj)
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
>     return Popen(process_obj)
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
>     super().__init__(process_obj)
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
>     self._launch(process_obj)
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
>     reduction.dump(process_obj, fp)
>   File "/opt/python/3.8.1/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
>     ForkingPickler(file, protocol).dump(obj)
> AttributeError: Can't pickle local object 'SchedulerJob._execute.<locals>.processor_factory'
> {code}
> The reason is scheduler try to run subprocesses using multiprocessing with spawn mode.
> Actually, as of Python 3.8, spawn mode is the default mode in macOS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)