You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "csp33 (via GitHub)" <gi...@apache.org> on 2023/02/07 10:44:39 UTC

[GitHub] [airflow] csp33 opened a new pull request, #29401: Configure pools via Helm chart

csp33 opened a new pull request, #29401:
URL: https://github.com/apache/airflow/pull/29401

   closes #11707
   
   Credits to @FloChehab (https://github.com/apache/airflow/pull/15093)
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kaxil commented on pull request #29401: Configure pools via Helm chart

Posted by "kaxil (via GitHub)" <gi...@apache.org>.
kaxil commented on PR #29401:
URL: https://github.com/apache/airflow/pull/29401#issuecomment-1423421713

   > I'm not really a fan of this approach, unfortunately. This has "source of truth" problems in my eyes, as one can still modify these values in the UI. At that point, is the chart right to overwrite it on the next update, or is the UI value the right one? Also, removing a pool here doesn't actually remove it from the db.
   > 
   > There isn't really a "better" option, short of doing some Airflow side enhancements of pools. I'd almost rather waiting for that and only supporting it on newer Airflows instead of supporting something with a number of nuances to it?
   > 
   > I'm curious what other maintainers think.
   
   +1 -- Agreed with @jedcunningham . That is one of the things we said the user-community chart did that we would never do when we start building the official helm chart for Airflow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] closed pull request #29401: Configure pools via Helm chart

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #29401: Configure pools via Helm chart
URL: https://github.com/apache/airflow/pull/29401


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #29401: Configure pools via Helm chart

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #29401:
URL: https://github.com/apache/airflow/pull/29401#issuecomment-1484286065

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] csp33 commented on pull request #29401: Configure pools via Helm chart

Posted by "csp33 (via GitHub)" <gi...@apache.org>.
csp33 commented on PR #29401:
URL: https://github.com/apache/airflow/pull/29401#issuecomment-1422360957

   @potiuk @jedcunningham test are (finally) passing (the ones failing are due to timeouts). Could you please review this PR?
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #29401: Configure pools via Helm chart

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #29401:
URL: https://github.com/apache/airflow/pull/29401#issuecomment-1423860810

   > +1 -- Agreed with @jedcunningham . That is one of the things we said the user-community chart did that we would never do when we start building the official helm chart for Airflow (or so I remember :) )
   
   +1 too. Managing pools this way is a bad idea for the reasons @jedcunningham nicely laid out.
   
   > We had discussion on that. See https://github.com/apache/airflow/issues/18582
   
   As @eladkal mentioned, this would have to redefine the way our SQL queries are done in scheduler. There is a certain complexity of Pools in multi-scheduler context that makes it a bit complex (but not impossible I think). I once assesed this might be hard.  But I looked closely how it works and I **think** possibly we could attempt to change how Pools table is used. I think either code was different when I looked at it in #18582, or maybe I have not thought about this idea before, but let me explain the context:
   
   * Pool table is currenlty locked during scheduling as critical section in `_executable_task_instances_to_queued`:
   ```
           # Get the pool settings. We get a lock on the pool rows, treating this as a "critical section"
           # Throws an exception if lock cannot be obtained, rather than blocking
           pools = Pool.slots_stats(lock_rows=True, session=session)
   ```
   
   * And the `slot_stats` method uses pool to lock the pool table
   ```
           query = session.query(Pool.pool, Pool.slots)
   ````
   
   What happen next in the method then it builds the "PoolStats" dict and marks tasks as executable if they are still have "free slots" (but it uses the in-memory Dict). Only then it releases the lock on Pools.
   
   This is -  in order to accomodate multiple schedulers and the critical section is to make sure schedulers will not mark dag runs as executable when Pools would be over-subscribed.
   
   So while indeed Pools table is locked and used in the query, fundamentally I think we could take the "total" values from ENV variables rather than from Pools table. This is somewhat coincidental that we use "totals" from the Pools table - it's convenient and the code already has Pool objects returned by the query to build PoolStats, but not **really** necessary. 
   The main usage of the Pool table is that whole table is locked while the critical section happens. Locking Pool table also has nice side-effect that when we use UI or CLI to modify pools, we won't change the pool size while scheduler keeps the table locked in critical section - so pool size update will never happen while scheduler is in it.  But with env variables it is impossible to change it without restarting airflow, so it is not really an argument if we use them.  Or we even could have a separate lock for this ciricial section in `_executable_task_instances_to_queued` - it does not have to be Pools table lock.
   
   There is no referential identity between Pools and any other table it seems so I think we would just have to modify slots_stats function to retrieve totals from elsewhere (env vars) and things would work. The `pool` column in TaskInstance is just string (part of an index but not part of referential integrity with Pool table). 
   
   ```
       pool = Column(String(256), nullable=False)
   ```
   
   I still do not like the "multiple source of truth" in this case. But there is potentially an option that we could disable the pool counts in the UI entirely and replace them with pool counts. managed entirely in env vars only.
   
   Also I think changing the approach in this area is something that we want anyway so we could kill two birds with the same stone - see https://github.com/apache/airflow/issues/29416 - our pool count currently does not currently take into account deferred tasks. but there are cases where it would be desired.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on pull request #29401: Configure pools via Helm chart

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on PR #29401:
URL: https://github.com/apache/airflow/pull/29401#issuecomment-1423745797

   > What if we just make pools optionally configurable with env vars? Sort of like var or conn now. Might screw up some queries but, maybe could be handled. Then you could in effect accomplish same from chart
   
   We had discussion on that. See https://github.com/apache/airflow/issues/18582


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jedcunningham commented on a diff in pull request #29401: Configure pools via Helm chart

Posted by "jedcunningham (via GitHub)" <gi...@apache.org>.
jedcunningham commented on code in PR #29401:
URL: https://github.com/apache/airflow/pull/29401#discussion_r1100813845


##########
chart/templates/jobs/import-pools-job-serviceaccount.yaml:
##########
@@ -0,0 +1,39 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+#############################################
+## Airflow Import Pools Job ServiceAccount
+##############################################
+{{- if .Values.importPoolsJob.serviceAccount.create }}

Review Comment:
   No reason do deploy anything when pools aren't defined.



##########
chart/values.yaml:
##########
@@ -757,6 +757,80 @@ scheduler:
 
   env: []
 
+# Pools that will be added to Airflow (Keys are pools names and values are pools settings)

Review Comment:
   We need to be _very_ clear here that this doesn't actually manage the pools, just does a one off, at time of deploy, import.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #29401: Configure pools via Helm chart

Posted by "dstandish (via GitHub)" <gi...@apache.org>.
dstandish commented on PR #29401:
URL: https://github.com/apache/airflow/pull/29401#issuecomment-1423461770

   What if we just make pools optionally configurable with env vars? Sort of like var or conn now. Might screw up some queries but, maybe could be handled. Then you could in effect accomplish same from chart


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org