You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Andreas Balke <an...@now-extern.de> on 2020/05/06 10:00:53 UTC

CeleryExecutor and Redis

Dear Airflow community, 

not sure if I’m targeting the right audience here. Just trying though :) 

In a basic setup, like described here: https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/ <https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/>, using `airflow_version: 1.10.10`, I discover two issues: 

1) Even though the `airflow.cfg` is configured to use `CeleryExecutor` +Redis and all the logs mention to use it, there is no Task visible in Flower. When a DAG is started, it will only be processed via the DB

2) Many times, especially once a DAG is running, in the UI I see the warning "The scheduler does not appear to be running.”

Do you have any advice how to solve those?

Best, Andreas

Re: CeleryExecutor and Redis

Posted by Andreas Balke <an...@now-extern.de>.
> 2) Many times, especially once a DAG is running, in the UI I see the warning "The scheduler does not appear to be running.”

The other thing that was an issue: `SCHEDULER_RUNS` in the environment was set to some value `> -1`. This was causing "The scheduler does not appear to be running.” warning and actually caused misbehaviours. 

> On 6. May 2020, at 12:24, Andreas Balke <an...@now-extern.de> wrote:
> 
> I just discovered, that Flower renders Tasks, as soon as I’m using a `BranchPythonOperator`. So that does not appear to be a setup problem. 
> 
> 
>> On 6. May 2020, at 12:11, Andreas Balke <andreas.balke@now-extern.de <ma...@now-extern.de>> wrote:
>> 
>> Hi Ash, 
>> 
>> that appears to be OK: 
>> 
>> ● airflow-scheduler.service - Airflow scheduler daemon
>>    Loaded: loaded (/lib/systemd/system/airflow-scheduler.service; enabled; vendor preset: enabled)
>>    Active: active (running) since Wed 2020-05-06 10:09:20 UTC; 3s ago
>>  Main PID: 10610 (/usr/bin/python)
>>     Tasks: 3 (limit: 4667)
>>    CGroup: /system.slice/airflow-scheduler.service
>>            ├─10610 /usr/bin/python3 /usr/local/bin/airflow scheduler -n 5 --pid /run/airflow/scheduler.pid
>>            └─10641 airflow scheduler -- DagFileProcessorManager
>> 
>> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,119] {scheduler_job.py:1504} DEBUG - Heartbeating the executor
>> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:122} DEBUG - 0 running task instances
>> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:123} DEBUG - 0 in queue
>> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:124} DEBUG - 32 open slots
>> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.celery_executor.CeleryExecutor'> sync method
>> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {celery_executor.py:240} DEBUG - No task to query celery, skipping sync
>> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {scheduler_job.py:1459} DEBUG - Ran scheduling loop in 0.00 seconds
>> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,121] {scheduler_job.py:1462} DEBUG - Sleeping for 1.00 seconds
>> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,153] {scheduler_job.py:268} DEBUG - Waiting for <Process(DagFileProcessor1-Process, stopped)>
>> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,234] {settings.py:278} DEBUG - Disposing DB connection pool (PID 10646)
>> 
>> # cat /etc/airflow/airflow.cfg | grep -i execut
>> # The executor class that airflow should use. Choices include
>> # SequentialExecutor, LocalExecutor, CeleryExecutor
>> executor = CeleryExecutor
>> 
>> Best, Andreas
>> 
>> 
>>> On 6. May 2020, at 12:03, Ash Berlin-Taylor <ash@apache.org <ma...@apache.org>> wrote:
>>> 
>>> Your second point there would lead me to believe that the scheduler is
>>> still actually running with the default SequentialExecutor.
>>> 
>>> Which config file did you edit?
>>> 
>>> What output is shown when you (re)start the scheduler?
>>> 
>>> Thanks,
>>> -ash
>>> 
>>> On May 6 2020, at 11:00 am, Andreas Balke <andreas.balke@now-extern.de <ma...@now-extern.de>> wrote:
>>> 
>>>> Dear Airflow community,  
>>>> 
>>>> not sure if I’m targeting the right audience here. Just trying though
>>>> :)  
>>>> 
>>>> In a basic setup, like described here:
>>>> https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/ <https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/>
>>>> <https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/ <https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/>>,
>>>> using `airflow_version: 1.10.10`, I discover two issues:  
>>>> 
>>>> 1) Even though the `airflow.cfg` is configured to use `CeleryExecutor`
>>>> +Redis and all the logs mention to use it, there is no Task visible in
>>>> Flower. When a DAG is started, it will only be processed via the DB
>>>> 
>>>> 2) Many times, especially once a DAG is running, in the UI I see the
>>>> warning "The scheduler does not appear to be running.”
>>>> 
>>>> Do you have any advice how to solve those?
>>>> 
>>>> Best, Andreas
>> 
> 


Re: CeleryExecutor and Redis

Posted by Andreas Balke <an...@now-extern.de>.
I just discovered, that Flower renders Tasks, as soon as I’m using a `BranchPythonOperator`. So that does not appear to be a setup problem. 


> On 6. May 2020, at 12:11, Andreas Balke <an...@now-extern.de> wrote:
> 
> Hi Ash, 
> 
> that appears to be OK: 
> 
> ● airflow-scheduler.service - Airflow scheduler daemon
>    Loaded: loaded (/lib/systemd/system/airflow-scheduler.service; enabled; vendor preset: enabled)
>    Active: active (running) since Wed 2020-05-06 10:09:20 UTC; 3s ago
>  Main PID: 10610 (/usr/bin/python)
>     Tasks: 3 (limit: 4667)
>    CGroup: /system.slice/airflow-scheduler.service
>            ├─10610 /usr/bin/python3 /usr/local/bin/airflow scheduler -n 5 --pid /run/airflow/scheduler.pid
>            └─10641 airflow scheduler -- DagFileProcessorManager
> 
> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,119] {scheduler_job.py:1504} DEBUG - Heartbeating the executor
> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:122} DEBUG - 0 running task instances
> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:123} DEBUG - 0 in queue
> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:124} DEBUG - 32 open slots
> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.celery_executor.CeleryExecutor'> sync method
> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {celery_executor.py:240} DEBUG - No task to query celery, skipping sync
> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {scheduler_job.py:1459} DEBUG - Ran scheduling loop in 0.00 seconds
> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,121] {scheduler_job.py:1462} DEBUG - Sleeping for 1.00 seconds
> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,153] {scheduler_job.py:268} DEBUG - Waiting for <Process(DagFileProcessor1-Process, stopped)>
> May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,234] {settings.py:278} DEBUG - Disposing DB connection pool (PID 10646)
> 
> # cat /etc/airflow/airflow.cfg | grep -i execut
> # The executor class that airflow should use. Choices include
> # SequentialExecutor, LocalExecutor, CeleryExecutor
> executor = CeleryExecutor
> 
> Best, Andreas
> 
> 
>> On 6. May 2020, at 12:03, Ash Berlin-Taylor <ash@apache.org <ma...@apache.org>> wrote:
>> 
>> Your second point there would lead me to believe that the scheduler is
>> still actually running with the default SequentialExecutor.
>> 
>> Which config file did you edit?
>> 
>> What output is shown when you (re)start the scheduler?
>> 
>> Thanks,
>> -ash
>> 
>> On May 6 2020, at 11:00 am, Andreas Balke <andreas.balke@now-extern.de <ma...@now-extern.de>> wrote:
>> 
>>> Dear Airflow community,  
>>> 
>>> not sure if I’m targeting the right audience here. Just trying though
>>> :)  
>>> 
>>> In a basic setup, like described here:
>>> https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/ <https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/>
>>> <https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/>,
>>> using `airflow_version: 1.10.10`, I discover two issues:  
>>> 
>>> 1) Even though the `airflow.cfg` is configured to use `CeleryExecutor`
>>> +Redis and all the logs mention to use it, there is no Task visible in
>>> Flower. When a DAG is started, it will only be processed via the DB
>>> 
>>> 2) Many times, especially once a DAG is running, in the UI I see the
>>> warning "The scheduler does not appear to be running.”
>>> 
>>> Do you have any advice how to solve those?
>>> 
>>> Best, Andreas
> 


Re: CeleryExecutor and Redis

Posted by Andreas Balke <an...@now-extern.de>.
Hi Ash, 

that appears to be OK: 

● airflow-scheduler.service - Airflow scheduler daemon
   Loaded: loaded (/lib/systemd/system/airflow-scheduler.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2020-05-06 10:09:20 UTC; 3s ago
 Main PID: 10610 (/usr/bin/python)
    Tasks: 3 (limit: 4667)
   CGroup: /system.slice/airflow-scheduler.service
           ├─10610 /usr/bin/python3 /usr/local/bin/airflow scheduler -n 5 --pid /run/airflow/scheduler.pid
           └─10641 airflow scheduler -- DagFileProcessorManager

May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,119] {scheduler_job.py:1504} DEBUG - Heartbeating the executor
May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:122} DEBUG - 0 running task instances
May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:123} DEBUG - 0 in queue
May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:124} DEBUG - 32 open slots
May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {base_executor.py:133} DEBUG - Calling the <class 'airflow.executors.celery_executor.CeleryExecutor'> sync method
May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {celery_executor.py:240} DEBUG - No task to query celery, skipping sync
May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,120] {scheduler_job.py:1459} DEBUG - Ran scheduling loop in 0.00 seconds
May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,121] {scheduler_job.py:1462} DEBUG - Sleeping for 1.00 seconds
May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,153] {scheduler_job.py:268} DEBUG - Waiting for <Process(DagFileProcessor1-Process, stopped)>
May 06 10:09:24 ip-10-1-17-115 airflow[10610]: [2020-05-06 10:09:24,234] {settings.py:278} DEBUG - Disposing DB connection pool (PID 10646)

# cat /etc/airflow/airflow.cfg | grep -i execut
# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor
executor = CeleryExecutor

Best, Andreas


> On 6. May 2020, at 12:03, Ash Berlin-Taylor <as...@apache.org> wrote:
> 
> Your second point there would lead me to believe that the scheduler is
> still actually running with the default SequentialExecutor.
> 
> Which config file did you edit?
> 
> What output is shown when you (re)start the scheduler?
> 
> Thanks,
> -ash
> 
> On May 6 2020, at 11:00 am, Andreas Balke <an...@now-extern.de> wrote:
> 
>> Dear Airflow community,  
>> 
>> not sure if I’m targeting the right audience here. Just trying though
>> :)  
>> 
>> In a basic setup, like described here:
>> https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/
>> <https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/>,
>> using `airflow_version: 1.10.10`, I discover two issues:  
>> 
>> 1) Even though the `airflow.cfg` is configured to use `CeleryExecutor`
>> +Redis and all the logs mention to use it, there is no Task visible in
>> Flower. When a DAG is started, it will only be processed via the DB
>> 
>> 2) Many times, especially once a DAG is running, in the UI I see the
>> warning "The scheduler does not appear to be running.”
>> 
>> Do you have any advice how to solve those?
>> 
>> Best, Andreas


Re: CeleryExecutor and Redis

Posted by Ash Berlin-Taylor <as...@apache.org>.
Your second point there would lead me to believe that the scheduler is
still actually running with the default SequentialExecutor.

Which config file did you edit?

What output is shown when you (re)start the scheduler?

Thanks,
-ash

On May 6 2020, at 11:00 am, Andreas Balke <an...@now-extern.de> wrote:

> Dear Airflow community,  
>  
> not sure if I’m targeting the right audience here. Just trying though
> :)  
>  
> In a basic setup, like described here:
> https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/
> <https://www.cloudwalker.io/2019/09/30/airflow-scale-out-with-redis-and-celery/>,
> using `airflow_version: 1.10.10`, I discover two issues:  
>  
> 1) Even though the `airflow.cfg` is configured to use `CeleryExecutor`
> +Redis and all the logs mention to use it, there is no Task visible in
> Flower. When a DAG is started, it will only be processed via the DB
>  
> 2) Many times, especially once a DAG is running, in the UI I see the
> warning "The scheduler does not appear to be running.”
>  
> Do you have any advice how to solve those?
>  
> Best, Andreas