You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/10 11:00:46 UTC

[GitHub] [airflow] ururu-fy opened a new issue #20196: dags folder mount readonly with git-sync true option

ururu-fy opened a new issue #20196:
URL: https://github.com/apache/airflow/issues/20196


   ### Official Helm Chart version
   
   1.3.0 (latest released)
   
   ### Apache Airflow version
   
   2.2.1
   
   ### Kubernetes Version
   
   1.20
   
   ### Helm Chart configuration
   
   # Git sync
   dags:
     persistence:
       # Enable persistent volume for storing dags
       enabled: false
       # Volume size for dags
       size: 1Gi
       # If using a custom storageClass, pass name here
       storageClassName:
       # access mode of the persistent volume
       accessMode: ReadWriteOnce
       ## the name of an existing PVC to use
       existingClaim:
     gitSync:
       enabled: true
       repo: git@bitbucket.org:/data-airflow.git
       branch: test_dags
       rev: HEAD
       depth: 1
       # the number of consecutive failures allowed before aborting
       maxFailures: 1
       # subpath within the repo where dags are located
       # should be "" if dags are at repo root
       subPath: ""
       # if your repo needs a user name password
       # you can load them to a k8s secret like the one below
       #   ---
       #   apiVersion: v1
       #   kind: Secret
       #   metadata:
       #     name: git-credentials-data-airflow
       #   data:
       #     GIT_SYNC_USERNAME: 
       #     GIT_SYNC_PASSWORD: 
       # and specify the name of the secret below
       #
       #credentialsSecret: git-credentials-data-airflow
       #
       #
       # If you are using an ssh clone url, you can load
       # the ssh private key to a k8s secret like the one below
       #   ---
       #   apiVersion: v1
       #   kind: Secret
       #   metadata:
       #     name: airflow-ssh-secret
       #   data:
       #     # key needs to be gitSshKey
       #     gitSshKey: <base64_encoded_data>
       # and specify the name of the secret below
       sshKeySecret: airflow-ssh-secret
       #
       # If you are using an ssh private key, you can additionally
       # specify the content of your known_hosts file, example:
       #
       # knownHosts: |
       #    <host1>,<ip1> <key1>
       #    <host2>,<ip2> <key2>
       # interval between git sync attempts in seconds
       wait: 60
       containerName: git-sync
       uid: 65533
       extraVolumeMounts: []
       env: []
       resources: {}
       #  limits:
       #   cpu: 100m
       #   memory: 128Mi
       #  requests:
       #   cpu: 100m
       #   memory: 128Mi
   
   
   ### Docker Image customisations
   
   _No response_
   
   ### What happened
   
   When I turn on git sync, the dags folder is mounted read-only. If you turn off synchronization, the folder is mounted for reading and writing.
   
   ### What you expected to happen
   
   Mount /opt/airflow/dags in readwrite mode
   
   ### How to reproduce
   
   enable gitSync: true
   
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #20196:
URL: https://github.com/apache/airflow/issues/20196#issuecomment-991097464


   Even then, your pattern won't work without also having persistence enabled, as each component gets its own emptyDir dags volume. There are also race conditions with your plan - think of the delay between syncing a new change from git (which throws away local changes) and your generator DAG running.
   
   You'd be better off having something else do the DAG generation (e.g. ci/cd), then using git-sync on that output. Or use a PVC instead of gitsync.
   
   I think it's "safer" to only allow read so the wiping behavior isn't surprising. Thoughts @kaxil / @dstandish?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ururu-fy commented on issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
ururu-fy commented on issue #20196:
URL: https://github.com/apache/airflow/issues/20196#issuecomment-990880703


   I would like to store master dag file generator in git and sync it to dags folder. Master dag auto generates new dags into the same folder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ururu-fy edited a comment on issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
ururu-fy edited a comment on issue #20196:
URL: https://github.com/apache/airflow/issues/20196#issuecomment-990880703


   I would like to store master dag file generator in git and sync it to dags folder. Master dag must auto generates new dags into the same folder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #20196:
URL: https://github.com/apache/airflow/issues/20196


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #20196:
URL: https://github.com/apache/airflow/issues/20196


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #20196:
URL: https://github.com/apache/airflow/issues/20196#issuecomment-991300139


   >I think it's "safer" to only allow read so the wiping behavior isn't surprising. Thoughts @kaxil / @dstandish?
   
   
   Agreed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20196:
URL: https://github.com/apache/airflow/issues/20196#issuecomment-992589498


   > The thing is that my dags are very dynamic and can change very often. I'm not worried about their safety, since the dag master takes metadata from the database. Therefore, I see no reason to store them in git. It would be great to be able to the user to change the parameters of these settings depending on his needs.
   
   Then probably you are doing it wrong (and it will not work as you expect them). From the main Airflow page:
   https://airflow.apache.org/docs/apache-airflow/stable/index.html
   
   
   > Workflows are expected to be mostly static or slowly changing. You can think of the structure of the tasks in your workflow as slightly more dynamic than a database structure would be. Airflow workflows are expected to look similar from a run to the next, this allows for clarity around unit of work and continuity.
   
   You should rethink if Airflow is a good choice for you if you want the DAGs to change frequently. The fact that you generate file in DAG folder does not mean that it will be reflected immediately or even very quickly in the DAG structure (it needs to be parsed by DagParser, stored in serialized dag table and scheduled by scheduler). Also it means that  your hist
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20196:
URL: https://github.com/apache/airflow/issues/20196#issuecomment-991234377


   yep. This is a very wrong approach (And @jedcunningham had nicely explained why).
   
   Here are our best practices on dynamic dag generation: https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#dynamic-dag-generation - you have couple of options there.
   
   Having the DAG generate python files dynamically sounds like recipe for disaster.  Please follow best practices @ururu-fy .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ururu-fy commented on issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
ururu-fy commented on issue #20196:
URL: https://github.com/apache/airflow/issues/20196#issuecomment-992516105


   The thing is that my dags are very dynamic and can change very often. I'm not worried about their safety, since the dag master takes metadata from the database. Therefore, I see no reason to store them in git. It would be great to be able to the user to change the parameters of these settings depending on his needs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20196:
URL: https://github.com/apache/airflow/issues/20196#issuecomment-991234377


   yep. This is a very wrong approach (And @jedcunningham had nicely explained why).
   
   Here are our best practices on dynamic dag generation: https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#dynamic-dag-generation - you have couple of options there.
   
   Having the DAG generate python files dynamically sounds like recipe for disaster.  Please follow best practices @ururu-fy .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #20196:
URL: https://github.com/apache/airflow/issues/20196#issuecomment-992589498


   > The thing is that my dags are very dynamic and can change very often. I'm not worried about their safety, since the dag master takes metadata from the database. Therefore, I see no reason to store them in git. It would be great to be able to the user to change the parameters of these settings depending on his needs.
   
   Then probably you are doing it wrong (and it will not work as you expect them). From the main Airflow page:
   https://airflow.apache.org/docs/apache-airflow/stable/index.html
   
   
   > Workflows are expected to be mostly static or slowly changing. You can think of the structure of the tasks in your workflow as slightly more dynamic than a database structure would be. Airflow workflows are expected to look similar from a run to the next, this allows for clarity around unit of work and continuity.
   
   You should rethink if Airflow is a good choice for you if you want the DAGs to change frequently. The fact that you generate file in DAG folder does not mean that it will be reflected immediately or even very quickly in the DAG structure (it needs to be parsed by DagParser, stored in serialized dag table and scheduled by scheduler). Also it means that  your history of DAG is correct. And most of all - you will never be sure which version of DAG is run at any given time.
   
   DAG Versioning which is on the horizon of  Airflow (but no-one actively works on it) is supposed to handle this case but it's monhs if not years away: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-36+DAG+Versioning
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #20196:
URL: https://github.com/apache/airflow/issues/20196#issuecomment-992589498






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #20196: dags folder mount readonly with git-sync true option

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #20196:
URL: https://github.com/apache/airflow/issues/20196#issuecomment-991300139


   >I think it's "safer" to only allow read so the wiping behavior isn't surprising. Thoughts @kaxil / @dstandish?
   
   
   Agreed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org