You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/04/05 00:25:00 UTC

[jira] [Commented] (AIRFLOW-6778) Add a DAGs PVC Mount Point Option for Workers under Kubernetes Executor

    [ https://issues.apache.org/jira/browse/AIRFLOW-6778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17075569#comment-17075569 ] 

ASF GitHub Bot commented on AIRFLOW-6778:
-----------------------------------------

brandonwillard commented on pull request #8147: [AIRFLOW-6778] Add a configurable DAGs volume mount path for Kubernetes
URL: https://github.com/apache/airflow/pull/8147
 
 
   This PR introduces a new config option, `kubernetes.extra_volume_mounts`, that allows users to specify multiple Kubernetes volumes to be mounted in each generated worker pod.
   
   This PR is replacing #7423 (moved to my personal fork).
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add a DAGs PVC Mount Point Option for Workers under Kubernetes Executor
> -----------------------------------------------------------------------
>
>                 Key: AIRFLOW-6778
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6778
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: executor-kubernetes, worker
>    Affects Versions: 1.10.6, 1.10.7, 1.10.8, 1.10.9
>            Reporter: Brandon Willard
>            Assignee: Daniel Imberman
>            Priority: Blocker
>              Labels: kubernetes, options
>
> The worker pods generated by the Kubernetes Executor force the DAGs PVC to be mounted at the Airflow DAGs folder.  This, combined with a general inability to specify arbitrary PVCs on workers (see AIRFLOW-3126 and the linked/duplicated issues), severely constrains the usability of worker pods and the Kubernetes Executor as a whole.
>  
> For example, if a DAGs-containing PVC is rooted at a Python package (e.g. {{package/}}) that needs to be installed on each worker (e.g. DAGs in {{package/dags/}}, package install point at {{package/setup.py}}, and Airflow DAGs location {{/airflow/dags}}), then the current static mount point logic will only allow a worker to directly mount the entire package into the Airflow DAGs location  —  while the actual DAGs are in a subdirectory — or exclusively mount the package's sub-path {{package/dags}} (using the existing {{kubernetes.dags_volume_subpath}} config option).  While the latter is at least correct, it completely foregoes the required parent directory and it makes the requisite package unavailable for installation (e.g. the files under {{package/}} are not available).
>  
> -In general, the only approach that seems to work for the Kubernetes Executor is to specify a worker image with all DAG dependencies pre-loaded, which largely voids the usefulness of a single DAGs PVC that can be dynamically updated.  At best, one can include a {{requirements.txt}} in the PVC and use it in tandem with an entry-point script built into the image, but that still doesn't help with source installations of custom packages stored and updated in a PVC.-
> Edit: This isn't even possible, because worker pods are created using [the {{command}} field instead of {{args}}|https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#notes]!
>  
> A quick fix for this situation is to allow one to specify the DAGs PVC mount point.  With this option, one can mount the PVC anywhere and specify an Airflow DAGs location that works in conjunction with the mount point (e.g. mount the PVC at {{/airflow/package}} and independently set the Airflow DAGs location to {{/airflow/package/dags}}).  This option would — in many cases — obviate the need for the marginally useful {{kubernetes.dags_volume_subpath}} options, as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)