You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Valeriy (JIRA)" <ji...@apache.org> on 2018/12/18 12:03:00 UTC

[jira] [Comment Edited] (AIRFLOW-3532) Apache Airflow > 1.8 don't working with Celery 4.x and don't working Celery 3.x using other than amqp transport

    [ https://issues.apache.org/jira/browse/AIRFLOW-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723983#comment-16723983 ] 

Valeriy edited comment on AIRFLOW-3532 at 12/18/18 12:02 PM:
-------------------------------------------------------------

[~ashb] Thanks for your reply!
{quote}Are you running the scheduler and the workers on the same node? If not your broker_url and result_backend will be in correct.
{quote}
Yes, I'm runing airflow webserver, scheduler and worker on the same node. I also have second active server with the same installation of Airflow. On dedicated servers I have Redis and PostgreSQL, fault tolerance of which is provided by the HAproxy. Since Airflow works with Redis through HAproxy, I raised the timeouts up to 24 houres for client and server on the configuration of HAproxy. Maybe because of the HAproxy timeouts was a problem with DAG's of Airflow, I'll watch and to write about the results. I think what celery is so stupid that it is not ready to re-establish TCP-session, which breaks the receiver after timeout.

So, my Celery config:

 
{code:java}
[celery]
celery_app_name = airflow.executors.celery_executor
worker_concurrency = 16
worker_log_server_port = 8793
broker_url = redis://redis-server:6400/0
result_backend = db+postgres://airflow:pass@postgres-server:5434/airflow
flower_host = 0.0.0.0
flower_url_prefix = /flower
flower_port = 5555
default_queue = default
celery_config_options = airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG
ssl_active = False
ssl_key =
ssl_cert =
ssl_cacert =

[celery_broker_transport_options]
visibility_timeout = 86400{code}
 
{quote}You have config for Flower - what does Flower show about the number of active nodes.
{quote}
I,m installed Flower and again runnig DAG's in Airflow. I see a successful tasks that handles executor of Celery, but the statistics of the broker is zero, it may be a little later, the celery will start to put messages in Redis. I don't understand out when Celery gonna start putting something messages in Redis, but still waiting.

[привязать заголовок|https://i.ibb.co/YfXL1nR/2018-12-18-14-57-26.png]

I started DAG's force and I'm waiting for them to go queued. I hope this does not happen after changing the timeouts in the HAproxy configuration. If you have any thoughts about the information I have provided, I will be glad to hear your comments. If I again get the problem, then I'll proceed to the above you  diagnostic of Celery.

 


was (Author: trider):
[~ashb] Thanks for your reply!
{quote}Are you running the scheduler and the workers on the same node? If not your broker_url and result_backend will be in correct.
{quote}
Yes, I'm runing airflow webserver, scheduler and worker on the same node. I also have second active server with the same installation of Airflow. On dedicated servers I have Redis and PostgreSQL, fault tolerance of which is provided by the HAproxy. Since Airflow works with Redis through HAproxy, I raised the timeouts up to 24 houres for client and server on the configuration of HAproxy. Maybe because of the HAproxy timeouts was a problem with DAG's of Airflow, I'll watch and to write about the results. I think what celery is so stupid that it is not ready to re-establish TCP-session, which breaks the receiver after timeout.

So, my Celery config:

 
{code:java}
[celery]
celery_app_name = airflow.executors.celery_executor
worker_concurrency = 16
worker_log_server_port = 8793
broker_url = redis://redis-server:6400/0
result_backend = db+postgres://airflow:pass@postgres-server:5434/airflow
flower_host = 0.0.0.0
flower_url_prefix = /flower
flower_port = 5555
default_queue = default
celery_config_options = airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG
ssl_active = False
ssl_key =
ssl_cert =
ssl_cacert =

[celery_broker_transport_options]
visibility_timeout = 86400{code}
 
{quote}You have config for Flower - what does Flower show about the number of active nodes.
{quote}
I,m installed Flower and again runnig DAG's in Airflow. I see a successful tasks that handles executor of Celery, but the statistics of the broker is zero, it may be a little later, the celery will start to put messages in Redis.

I started DAG's force and I'm waiting for them to go queued. I hope this does not happen after changing the timeouts in the HAproxy configuration. If you have any thoughts about the information I have provided, I will be glad to hear your comments. If I again get the problem, then I'll proceed to the above you  diagnostic of Celery.

 

> Apache Airflow > 1.8 don't working with Celery 4.x and don't working Celery 3.x using other than amqp transport
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-3532
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3532
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: celery
>    Affects Versions: 1.9.0, 1.10.0, 1.10.1
>            Reporter: Valeriy
>            Priority: Major
>              Labels: celery
>
> I needed Airflow > 1.8 with all the necessary fixes in the cluster configuration.
> I'm used Aiflow 1.10.0/1.10.1 + Celery 4.2.0/4.2.1 and have problem with working DAG's. After some time, all DAG's gone away in queued. After restarting the worker, the problem is solved. I tried for a long time to find solutions to this problem (logs in DEBUG mode showed nothing) and found a number of discussions, one them: [https://stackoverflow.com/questions/43524457/airflow-tasks-queued-but-not-running]
> *{color:#d04437}As a result, we conclude that Airflow does not work with Celery 4.x!{color}* The code is not adapted to the Celery 4.x.
> I decided to try the Celery 3.x and damn I got an WARNING:
> {code:java}
> [2018-12-17 15:43:11,136: WARNING/MainProcess] /home/hadoop/youla_airflow/lib/python3.6/site-packages/celery/apps/worker.py:161: CDeprecationWarning:
> Starting from version 3.2 Celery will refuse to accept pickle by default.
> The pickle serializer is a security concern as it may give attackers
> the ability to execute any command.  It's important to secure
> your broker from unauthorized access when using pickle, so we think
> that enabling pickle should require a deliberate action and not be
> the default choice.
> If you depend on pickle then you should set a setting to disable this
> warning and to be sure that everything will continue working
> when you upgrade to Celery 3.2::
>     CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
> You must only enable the serializers that you will actually use.
>   warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
>  
>  -------------- celery@myserver v3.1.26.post2 (Cipater)
> ---- **** -----
> --- * ***  * -- Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.5.1804-Core
> -- * - **** ---
> - ** ---------- [config]
> - ** ---------- .> app:         airflow.executors.celery_executor:0x7f0093b86470
> - ** ---------- .> transport:   amqp://guest:**@localhost:5672//
> - ** ---------- .> results:     disabled://
> - *** --- * --- .> concurrency: 16 (prefork)
> -- ******* ----
> --- ***** ----- [queues]
>  -------------- .> default          exchange=default(direct) key=default
> {code}
> Airflow > 1.8 version with Celery 3.x flatly refuses to use transport other than amqp. About it already wrote here [http://mail-archives.apache.org/mod_mbox/airflow-commits/201801.mbox/%3CJIRA.13129586.1515519138000.610058.1515519180106@Atlassian.JIRA%3E]
> My Airflow config:
> {code:java}
> [celery]
> celery_app_name = airflow.executors.celery_executor
> worker_concurrency = 16
> worker_log_server_port = 8793
> broker_url = redis://localhost:6400/0
> result_backend = db+postgres://airflow:pass@localhost:5434/airflow
> flower_host = 0.0.0.0
> flower_url_prefix =
> flower_port = 5555
> default_queue = default
> celery_config_options = airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG
> {code}
> How do I run Airflow > 1.8 with Celery as a Redis broker? Is that possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)