You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/04 12:31:36 UTC

[GitHub] [airflow] ecerulm commented on issue #14598: Provide shared storage between task via pluggable storage providers (akin to S3 remote logging)

ecerulm commented on issue #14598:
URL: https://github.com/apache/airflow/issues/14598#issuecomment-790584181


   My idea was to implement with with a pluggable storage so that S3, and other storage can be used. 
   My initial idea on how this could work is the following: 
   * When the task starts it downloads (if exists) the `artifacts/dag_id/run_id/task_id/artifacts.zip` for each upstream task
   * Unzip all into local temporary `artifacts/` directory allocated for the task
   * run the operator, which have access to the `artifacts/` directory (it gets the path with api call)
   * when the operator finishes, the current status of the `artifacts` directory is packed as zip and uploaded to remote storage `artifacts/dag_id/run_id/current_task_id/artifacts.zip`
   
   This is akin to how many CI/CD software (travisci, gitlabcicd) works, where the "artifacts" directory is automatically packed at the end of a job and unpacked at the start of the next job. The difference being that usually the storage is part of the CI/CD solution and here I'm proposing something similar to airflow's remote logging where the location of logs could be set to a remote location.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org