You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Lewis John McGibbney <le...@apache.org> on 2021/06/08 23:31:19 UTC

Backing up PostgreSQL DB on K8s

Hi users@,
Does anyone have a recommended/best practice/preferred way for backing up PostgreSQL when Airflow is deployed into K8s?
We were thinking of writing a maintenance DAG which would do this... maybe even contributing it to https://github.com/teamclairvoyant/airflow-maintenance-dags.
I'm thinking it would just authenticate into the K8s cluster, find the postgresql pod and then perform a DB archival in s3 or something.
I also looked more into https://www.postgresql.org/docs/9.1/continuous-archiving.html which looks appealing. I'm interest to see what others are doing.
Any suggestions appreciated.
Thank you
lewismc

Re: Backing up PostgreSQL DB on K8s

Posted by Lewis John McGibbney <le...@apache.org>.
Hi Sumit,
Thanks for the suggestion.
We will experiment with extending the Helm chart to bake in in the PVC and PV objects... I can't see anything like this so far digging through the code so I feel it is lacking. Being able to configure this in values.yaml or CLI during install and update would be great.
Thank you

On 2021/06/09 05:41:13, Sumit Maheshwari <ms...@apache.org> wrote: 
> If you are backing up data to safeguard against pod failures, then I
> believe that you can use a PV as data storage for PSql & it would survive
> any pod restarts.
> 
> On Wed, Jun 9, 2021 at 5:01 AM Lewis John McGibbney <le...@apache.org>
> wrote:
> 
> > Hi users@,
> > Does anyone have a recommended/best practice/preferred way for backing up
> > PostgreSQL when Airflow is deployed into K8s?
> > We were thinking of writing a maintenance DAG which would do this... maybe
> > even contributing it to
> > https://github.com/teamclairvoyant/airflow-maintenance-dags.
> > I'm thinking it would just authenticate into the K8s cluster, find the
> > postgresql pod and then perform a DB archival in s3 or something.
> > I also looked more into
> > https://www.postgresql.org/docs/9.1/continuous-archiving.html which looks
> > appealing. I'm interest to see what others are doing.
> > Any suggestions appreciated.
> > Thank you
> > lewismc
> >
> 

Re: Backing up PostgreSQL DB on K8s

Posted by Lewis John McGibbney <le...@apache.org>.
wow OK I didn't know this was an option.
I'll see what I can do and write back if we make progress.
Thank you Daniel.

On 2021/06/10 01:01:23, Daniel Standish <dp...@gmail.com> wrote: 
> > Thanks Daniel. We would need to add this as an option to the K8s Helm
> Chart and that is kinda outside our current scope.
> 
> I may be misunderstanding what you're saying, but there's no need to modify
> the helm chart.
> 
> Generally speaking all airflow helm charts support using a metastore that
> is deployed outside of kubernetes.
> 
> Airflow just needs the creds.
> 
> E.g. with the official helm chart, you can use an external database with
> this configuration (i.e. in helm config.yaml):
> 
> data:
>   metadataSecretName: metastore-uri
>   resultBackendSecretName: result-backend
> 
> pgbouncer:
>   enabled: false
> 
> postgresql:
>   enabled: false
> 
> All that is required is you create k8s secrets with the conn URI.
> 
> The secrets have to be structured like this:
> 
> locals {
>   metastore_uri = ""
> }
> resource "kubernetes_secret" "metastore" {
>   metadata {
>     name = "metastore-uri"
>     namespace = var.environment_name
>   }
>   data = {
>     connection = "postgresql://${local.metastore_uri}"
>   }
>   type = "Opaque"
> }
> 
> resource "kubernetes_secret" "result_backend" {
>   metadata {
>     name = "result-backend"
>     namespace = var.environment_name
>   }
>   data = {
>     connection = "db+${kubernetes_secret.metastore.data.connection}"
>   }
>   type = "Opaque"
> }
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Jun 9, 2021 at 4:04 PM Lewis John McGibbney <le...@apache.org>
> wrote:
> 
> > Thanks Daniel. We would need to add this as an option to the K8s Helm
> > Chart and that is kinda outside our current scope.
> > Thanks for the suggestion.
> > lewismc
> >
> > On 2021/06/09 05:46:00, Daniel Standish <dp...@gmail.com> wrote:
> > > Perhaps it goes without saying but you might consider using cloud sql
> > > option such as aws rds, which provides persistence even if you destroy
> > and
> > > rebuild your k8s cluster, and of course automated backups.
> > >
> > > On Tue, Jun 8, 2021, 10:41 PM Sumit Maheshwari <ms...@apache.org>
> > wrote:
> > >
> > > > If you are backing up data to safeguard against pod failures, then I
> > > > believe that you can use a PV as data storage for PSql & it would
> > survive
> > > > any pod restarts.
> > > >
> > > > On Wed, Jun 9, 2021 at 5:01 AM Lewis John McGibbney <
> > lewismc@apache.org>
> > > > wrote:
> > > >
> > > >> Hi users@,
> > > >> Does anyone have a recommended/best practice/preferred way for
> > backing up
> > > >> PostgreSQL when Airflow is deployed into K8s?
> > > >> We were thinking of writing a maintenance DAG which would do this...
> > > >> maybe even contributing it to
> > > >> https://github.com/teamclairvoyant/airflow-maintenance-dags.
> > > >> I'm thinking it would just authenticate into the K8s cluster, find the
> > > >> postgresql pod and then perform a DB archival in s3 or something.
> > > >> I also looked more into
> > > >> https://www.postgresql.org/docs/9.1/continuous-archiving.html which
> > > >> looks appealing. I'm interest to see what others are doing.
> > > >> Any suggestions appreciated.
> > > >> Thank you
> > > >> lewismc
> > > >>
> > > >
> > >
> >
> 

Re: Backing up PostgreSQL DB on K8s

Posted by Daniel Standish <dp...@gmail.com>.
> Thanks Daniel. We would need to add this as an option to the K8s Helm
Chart and that is kinda outside our current scope.

I may be misunderstanding what you're saying, but there's no need to modify
the helm chart.

Generally speaking all airflow helm charts support using a metastore that
is deployed outside of kubernetes.

Airflow just needs the creds.

E.g. with the official helm chart, you can use an external database with
this configuration (i.e. in helm config.yaml):

data:
  metadataSecretName: metastore-uri
  resultBackendSecretName: result-backend

pgbouncer:
  enabled: false

postgresql:
  enabled: false

All that is required is you create k8s secrets with the conn URI.

The secrets have to be structured like this:

locals {
  metastore_uri = ""
}
resource "kubernetes_secret" "metastore" {
  metadata {
    name = "metastore-uri"
    namespace = var.environment_name
  }
  data = {
    connection = "postgresql://${local.metastore_uri}"
  }
  type = "Opaque"
}

resource "kubernetes_secret" "result_backend" {
  metadata {
    name = "result-backend"
    namespace = var.environment_name
  }
  data = {
    connection = "db+${kubernetes_secret.metastore.data.connection}"
  }
  type = "Opaque"
}









On Wed, Jun 9, 2021 at 4:04 PM Lewis John McGibbney <le...@apache.org>
wrote:

> Thanks Daniel. We would need to add this as an option to the K8s Helm
> Chart and that is kinda outside our current scope.
> Thanks for the suggestion.
> lewismc
>
> On 2021/06/09 05:46:00, Daniel Standish <dp...@gmail.com> wrote:
> > Perhaps it goes without saying but you might consider using cloud sql
> > option such as aws rds, which provides persistence even if you destroy
> and
> > rebuild your k8s cluster, and of course automated backups.
> >
> > On Tue, Jun 8, 2021, 10:41 PM Sumit Maheshwari <ms...@apache.org>
> wrote:
> >
> > > If you are backing up data to safeguard against pod failures, then I
> > > believe that you can use a PV as data storage for PSql & it would
> survive
> > > any pod restarts.
> > >
> > > On Wed, Jun 9, 2021 at 5:01 AM Lewis John McGibbney <
> lewismc@apache.org>
> > > wrote:
> > >
> > >> Hi users@,
> > >> Does anyone have a recommended/best practice/preferred way for
> backing up
> > >> PostgreSQL when Airflow is deployed into K8s?
> > >> We were thinking of writing a maintenance DAG which would do this...
> > >> maybe even contributing it to
> > >> https://github.com/teamclairvoyant/airflow-maintenance-dags.
> > >> I'm thinking it would just authenticate into the K8s cluster, find the
> > >> postgresql pod and then perform a DB archival in s3 or something.
> > >> I also looked more into
> > >> https://www.postgresql.org/docs/9.1/continuous-archiving.html which
> > >> looks appealing. I'm interest to see what others are doing.
> > >> Any suggestions appreciated.
> > >> Thank you
> > >> lewismc
> > >>
> > >
> >
>

Re: Backing up PostgreSQL DB on K8s

Posted by Lewis John McGibbney <le...@apache.org>.
Thanks Daniel. We would need to add this as an option to the K8s Helm Chart and that is kinda outside our current scope.
Thanks for the suggestion.
lewismc

On 2021/06/09 05:46:00, Daniel Standish <dp...@gmail.com> wrote: 
> Perhaps it goes without saying but you might consider using cloud sql
> option such as aws rds, which provides persistence even if you destroy and
> rebuild your k8s cluster, and of course automated backups.
> 
> On Tue, Jun 8, 2021, 10:41 PM Sumit Maheshwari <ms...@apache.org> wrote:
> 
> > If you are backing up data to safeguard against pod failures, then I
> > believe that you can use a PV as data storage for PSql & it would survive
> > any pod restarts.
> >
> > On Wed, Jun 9, 2021 at 5:01 AM Lewis John McGibbney <le...@apache.org>
> > wrote:
> >
> >> Hi users@,
> >> Does anyone have a recommended/best practice/preferred way for backing up
> >> PostgreSQL when Airflow is deployed into K8s?
> >> We were thinking of writing a maintenance DAG which would do this...
> >> maybe even contributing it to
> >> https://github.com/teamclairvoyant/airflow-maintenance-dags.
> >> I'm thinking it would just authenticate into the K8s cluster, find the
> >> postgresql pod and then perform a DB archival in s3 or something.
> >> I also looked more into
> >> https://www.postgresql.org/docs/9.1/continuous-archiving.html which
> >> looks appealing. I'm interest to see what others are doing.
> >> Any suggestions appreciated.
> >> Thank you
> >> lewismc
> >>
> >
> 

Re: Backing up PostgreSQL DB on K8s

Posted by Daniel Standish <dp...@gmail.com>.
Perhaps it goes without saying but you might consider using cloud sql
option such as aws rds, which provides persistence even if you destroy and
rebuild your k8s cluster, and of course automated backups.

On Tue, Jun 8, 2021, 10:41 PM Sumit Maheshwari <ms...@apache.org> wrote:

> If you are backing up data to safeguard against pod failures, then I
> believe that you can use a PV as data storage for PSql & it would survive
> any pod restarts.
>
> On Wed, Jun 9, 2021 at 5:01 AM Lewis John McGibbney <le...@apache.org>
> wrote:
>
>> Hi users@,
>> Does anyone have a recommended/best practice/preferred way for backing up
>> PostgreSQL when Airflow is deployed into K8s?
>> We were thinking of writing a maintenance DAG which would do this...
>> maybe even contributing it to
>> https://github.com/teamclairvoyant/airflow-maintenance-dags.
>> I'm thinking it would just authenticate into the K8s cluster, find the
>> postgresql pod and then perform a DB archival in s3 or something.
>> I also looked more into
>> https://www.postgresql.org/docs/9.1/continuous-archiving.html which
>> looks appealing. I'm interest to see what others are doing.
>> Any suggestions appreciated.
>> Thank you
>> lewismc
>>
>

Re: Backing up PostgreSQL DB on K8s

Posted by Sumit Maheshwari <ms...@apache.org>.
If you are backing up data to safeguard against pod failures, then I
believe that you can use a PV as data storage for PSql & it would survive
any pod restarts.

On Wed, Jun 9, 2021 at 5:01 AM Lewis John McGibbney <le...@apache.org>
wrote:

> Hi users@,
> Does anyone have a recommended/best practice/preferred way for backing up
> PostgreSQL when Airflow is deployed into K8s?
> We were thinking of writing a maintenance DAG which would do this... maybe
> even contributing it to
> https://github.com/teamclairvoyant/airflow-maintenance-dags.
> I'm thinking it would just authenticate into the K8s cluster, find the
> postgresql pod and then perform a DB archival in s3 or something.
> I also looked more into
> https://www.postgresql.org/docs/9.1/continuous-archiving.html which looks
> appealing. I'm interest to see what others are doing.
> Any suggestions appreciated.
> Thank you
> lewismc
>