You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/04/14 07:32:08 UTC

[GitHub] [airflow] RosterIn opened a new issue #8291: Enforce Connections & Pools

RosterIn opened a new issue #8291: Enforce Connections & Pools
URL: https://github.com/apache/airflow/issues/8291
 
 
   
   
   **Description**
   
   Airflow allows Polls and Connection yet the integration between them is up to the user.
   
   We have 40+ users who upload DAGs. It's impossible to manually enforce that if user choose to use Connection "X_" then he must also use Pool "X_P".
   
   We would love a feature where in the Connection you can associate it to a Pool. So whenever the user is specifying this connection in Operator/Hook he must also specify the Pool - Otherwise the Operator will fail!
   
   Or even better save the trouble of having the users to manually specify the pool in the Operator - and stack up the tasks in the pool by itself.
   
   I'm not asking to change the current behavior of Pools. This can be an extension of the current behavior. Think of it like queues... The user who writing operator doesn't need to specify to which worker queue the tasks will go. Airflow handle it for him. then why not with Connections and Pools?
   
   **Related Issues**
   
   Moved from https://issues.apache.org/jira/browse/AIRFLOW-4955

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] RosterIn commented on issue #8291: Enforce Connections & Pools

Posted by GitBox <gi...@apache.org>.
RosterIn commented on issue #8291: Enforce Connections & Pools
URL: https://github.com/apache/airflow/issues/8291#issuecomment-613636952
 
 
   I will give it a try.
   
   as for the feature... I really see Pools as extension of Connection.
   In my opinion they should be combined if users wish to have it like that.
   
   If you see no value with this suggestion feel free to close this request.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] mik-laj commented on issue #8291: Enforce Connections & Pools

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8291: Enforce Connections & Pools
URL: https://github.com/apache/airflow/issues/8291#issuecomment-613662328
 
 
   I also see the benefit but I also understand the technical limitations. We cannot introduce some feature, we will limit the flexibility of operators and hooks. On the other hand, I can see that a given feature may have been useful for a certain group of users.  As a compromise, we have pool related to specific tasks. Now every task can have any logic, but there is still a technical possibility to limit the number of simultaneous connections to other systems. This is not the best in all use cases, but it is useful in more cases. This also doesn't have a big impact on performance. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] boring-cyborg[bot] commented on issue #8291: Enforce Connections & Pools

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #8291: Enforce Connections & Pools
URL: https://github.com/apache/airflow/issues/8291#issuecomment-613274983
 
 
   Thanks for opening your first issue here! Be sure to follow the issue template!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] mik-laj commented on issue #8291: Enforce Connections & Pools

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8291: Enforce Connections & Pools
URL: https://github.com/apache/airflow/issues/8291#issuecomment-613607028
 
 
   The cluster policy can make any modifications on the task or DAG after loading it. You can do any modification in the task, in particular check which operator it is, which connection it uses and on the basis of this set value in the `pool` attribute in the task.
   https://github.com/apache/airflow/blob/master/airflow/models/dagbag.py#L322-L323
   
   There is no universal method for linking connection and pool, because conn_id can be specified dynamically during execution. Many operators support Jinje in the _conn_id parameter.
   
   You need to think about what operators your organization uses and create for them a policy that will read ``_conn_id`` from a specific task and set the pool.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] mik-laj commented on issue #8291: Enforce Connections & Pools

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8291: Enforce Connections & Pools
URL: https://github.com/apache/airflow/issues/8291#issuecomment-613380583
 
 
   You can currently use cluster policy and set the appropriate pool for your tasks.
   https://airflow.readthedocs.io/en/latest/concepts.html#cluster-policy

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] RosterIn commented on issue #8291: Enforce Connections & Pools

Posted by GitBox <gi...@apache.org>.
RosterIn commented on issue #8291: Enforce Connections & Pools
URL: https://github.com/apache/airflow/issues/8291#issuecomment-613596062
 
 
   @mik-laj correct me if I'm wrong but cluster policy can't enforce pool. It can enforce queues (celery).
   If I understand this correctly the cluster policy kicks in when the task is Running. Pools come before that.
   
   Can you give instructions for something of the following:
   `conn_postgres_a`
   `conn_postgres_b`
   
   
   how can I force anyone using `conn_postgres_a` to use `pool_a` and who is using `conn_postgres_b` to use `pool_b` regardless if they are using PostrgreSQLOperator or SQLSensor or any other operator.
   
   I'm looking for a way to bind the `Connection` to specific `Pool`.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] RosterIn edited a comment on issue #8291: Enforce Connections & Pools

Posted by GitBox <gi...@apache.org>.
RosterIn edited a comment on issue #8291: Enforce Connections & Pools
URL: https://github.com/apache/airflow/issues/8291#issuecomment-613596062
 
 
   @mik-laj correct me if I'm wrong but cluster policy can't enforce pool. It can enforce queues (celery).
   If I understand this correctly the cluster policy kicks in when the task is Running. Pools come before that.
   
   Can you give instructions for something of the following:
   `conn_postgres_a`
   `conn_postgres_b`
   
   
   how can I force anyone using `conn_postgres_a` to use `pool_a` and who is using `conn_postgres_b` to use `pool_b` regardless if they are using PostrgreSQLOperator or SQLSensor or any other operator or even if they are using the PostgreSQLHook directly.
   
   I'm looking for a way to bind the `Connection` to specific `Pool`.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services