You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2017/12/12 11:50:00 UTC

[jira] [Commented] (AIRFLOW-1885) IndexError when polling ready workers and a gunicorn worker becomes a zombie

    [ https://issues.apache.org/jira/browse/AIRFLOW-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287492#comment-16287492 ] 

ASF subversion and git services commented on AIRFLOW-1885:
----------------------------------------------------------

Commit be54f0485eb0ec52b3147bea057b399565601e10 in incubator-airflow's branch refs/heads/master from [~johnbarker]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=be54f04 ]

[AIRFLOW-1885] Fix IndexError in ready_prefix_on_cmdline

If while trying to obtain a list of ready gunicorn
workers, one of them
becomes a zombie, psutil.cmdline returns [] (see
here:
https://github.com/giampaolo/psutil/blob/release-4
.2.0/psutil/_pslinux.py#L1007)

Boom:

    Traceback (most recent call last):
      File "/usr/local/bin/airflow", line 28, in
<module>
        args.func(args)
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 803, in
webserver
        restart_workers(gunicorn_master_proc, num_workers)
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 687, in
restart_workers
        num_ready_workers_running = get_num_ready_workers_
running(gunicorn_master_proc)
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 663, in
get_num_ready_workers_running
        proc for proc in workers
      File "/usr/local/lib/python3.5/dist-
packages/airflow/bin/cli.py", line 664, in
<listcomp>
        if settings.GUNICORN_WORKER_READY_PREFIX in
proc.cmdline()[0]
    IndexError: list index out of range

So ensure a cmdline is actually returned before
doing the cmdline prefix
check in ready_prefix_on_cmdline.

Also:

 * Treat psutil.NoSuchProcess error as non ready
worker
 * Add in tests for get_num_ready_workers_running

Closes #2844 from j16r/bugfix/poll_zombie_process


> IndexError when polling ready workers and a gunicorn worker becomes a zombie
> ----------------------------------------------------------------------------
>
>                 Key: AIRFLOW-1885
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1885
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: Airflow 1.8
>            Reporter: John Barker
>            Assignee: John Barker
>             Fix For: 1.9.1
>
>
> If one of the gunicorn workers happens to become a zombie between {{children()}} and {{cmdline()}} calls to psutil in {{get_num_ready_workers_running}} will raise an IndexError:
> {code}
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 28, in <module>
>     args.func(args)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 803, in webserver
>     restart_workers(gunicorn_master_proc, num_workers)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 687, in restart_workers
>     num_ready_workers_running = get_num_ready_workers_running(gunicorn_master_proc)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 663, in get_num_ready_workers_running
>     proc for proc in workers
>   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 664, in <listcomp>
>     if settings.GUNICORN_WORKER_READY_PREFIX in proc.cmdline()[0]
> IndexError: list index out of range
> {code}
> In version 4.2 of psutil, {{cmdline}} can return an empty array if the process is zombied: https://github.com/giampaolo/psutil/blob/release-4.2.0/psutil/_pslinux.py#L1007 so one must ensure that an array is returned with at least one item from {{cmdline}} before doing the {{in}} check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)