You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "shibataka000 (JIRA)" <ji...@apache.org> on 2019/05/23 04:59:00 UTC

[jira] [Updated] (AIRFLOW-4561) Pod show "FailedAttachVolume warning" when KubernetesExecutor used

     [ https://issues.apache.org/jira/browse/AIRFLOW-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

shibataka000 updated AIRFLOW-4561:
----------------------------------
    Description: 
I construct Airflow on GKE according to https://github.com/apache/airflow/tree/master/scripts/ci/kubernetes .
And I run tutorial dag ( https://airflow.apache.org/tutorial.html ) some times.
First pod finish successfully. But second pod doesn't work.

"kubectl describe pod tutorialsleep-08a4650139fb4043ab112d3f4716389d" command show following event.

{code}
Events:
  Type     Reason              Age                    From                                             Message
  ----     ------              ----                   ----                                             -------
  Warning  FailedMount         3m20s (x65 over 148m)  kubelet, gke-sandbox-default-pool-36f8eec9-zcrf  Unable to mount volumes for pod "tutorialsleep-08a4650139fb4043ab112d3f4716389d_default(54db6d45-7cff-11e9-885f-42010a9200c8)": timeout expired waiting for volumes to attach or mount for pod "default"/"tutorialsleep-08a4650139fb4043ab112d3f4716389d". list of unmounted volumes=[airflow-dags airflow-logs]. list of unattached volumes=[airflow-dags airflow-logs airflow-config default-token-vrhsd]
  Warning  FailedAttachVolume  7s (x77 over 150m)     attachdetach-controller                          AttachVolume.Attach failed for volume "pvc-9652f938-7cfe-11e9-885f-42010a9200c8" : googleapi: Error 400: RESOURCE_IN_USE_BY_ANOTHER_RESOURCE - The disk resource 'projects/shibata-dev-230102/zones/asia-northeast1-a/disks/gke-sandbox-1d5bc64e-d-pvc-9652f938-7cfe-11e9-885f-42010a9200c8' is already being used by 'projects/shibata-dev-230102/zones/asia-northeast1-a/instances/gke-sandbox-default-pool-36f8eec9-frv0'
{code}

"airflow" pod and "first task" pod are run on host "gke-sandbox-default-pool-36f8eec9-frv0". 
"second task" pod are run on host "gke-sandbox-default-pool-36f8eec9-zcrf".
I think host "gke-sandbox-default-pool-36f8eec9-zcrf" can't mount volumes because "gke-sandbox-default-pool-36f8eec9-frv0" already mount them.

Access mode of PV and PVC is "ReadOnlyMany" at https://github.com/apache/airflow/blob/master/scripts/ci/kubernetes/kube/volumes.yaml#L26 .
But I think some pod write data to "dags" and "logs" volumes, and it lock volume mount.

One solution is changing access mode to "ReadWriteMany".
But major volume plugin doesn't support "ReadWritemany" mode, for example AWSElasticBlockStore, GCEPersistentDisk.
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

  was:
I construct Airflow on CKE according to https://github.com/apache/airflow/tree/master/scripts/ci/kubernetes .
And I run tutorial dag ( https://airflow.apache.org/tutorial.html ) some times.
First pod finish successfully. But second pod doesn't work.

"kubectl describe pod tutorialsleep-08a4650139fb4043ab112d3f4716389d" command show following event.

{code}
Events:
  Type     Reason              Age                    From                                             Message
  ----     ------              ----                   ----                                             -------
  Warning  FailedMount         3m20s (x65 over 148m)  kubelet, gke-sandbox-default-pool-36f8eec9-zcrf  Unable to mount volumes for pod "tutorialsleep-08a4650139fb4043ab112d3f4716389d_default(54db6d45-7cff-11e9-885f-42010a9200c8)": timeout expired waiting for volumes to attach or mount for pod "default"/"tutorialsleep-08a4650139fb4043ab112d3f4716389d". list of unmounted volumes=[airflow-dags airflow-logs]. list of unattached volumes=[airflow-dags airflow-logs airflow-config default-token-vrhsd]
  Warning  FailedAttachVolume  7s (x77 over 150m)     attachdetach-controller                          AttachVolume.Attach failed for volume "pvc-9652f938-7cfe-11e9-885f-42010a9200c8" : googleapi: Error 400: RESOURCE_IN_USE_BY_ANOTHER_RESOURCE - The disk resource 'projects/shibata-dev-230102/zones/asia-northeast1-a/disks/gke-sandbox-1d5bc64e-d-pvc-9652f938-7cfe-11e9-885f-42010a9200c8' is already being used by 'projects/shibata-dev-230102/zones/asia-northeast1-a/instances/gke-sandbox-default-pool-36f8eec9-frv0'
{code}

"airflow" pod and "first task" pod are run on host "gke-sandbox-default-pool-36f8eec9-frv0". 
"second task" pod are run on host "gke-sandbox-default-pool-36f8eec9-zcrf".
I think host "gke-sandbox-default-pool-36f8eec9-zcrf" can't mount volumes because "gke-sandbox-default-pool-36f8eec9-frv0" already mount them.

Access mode of PV and PVC is "ReadOnlyMany" at https://github.com/apache/airflow/blob/master/scripts/ci/kubernetes/kube/volumes.yaml#L26 .
But I think some pod write data to "dags" and "logs" volumes, and it lock volume mount.

One solution is changing access mode to "ReadWriteMany".
But major volume plugin doesn't support "ReadWritemany" mode, for example AWSElasticBlockStore, GCEPersistentDisk.
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes


> Pod show "FailedAttachVolume warning" when KubernetesExecutor used
> ------------------------------------------------------------------
>
>                 Key: AIRFLOW-4561
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4561
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor
>    Affects Versions: 2.0.0
>         Environment: GKE
>            Reporter: shibataka000
>            Priority: Major
>         Attachments: kubectl_command_result.txt
>
>
> I construct Airflow on GKE according to https://github.com/apache/airflow/tree/master/scripts/ci/kubernetes .
> And I run tutorial dag ( https://airflow.apache.org/tutorial.html ) some times.
> First pod finish successfully. But second pod doesn't work.
> "kubectl describe pod tutorialsleep-08a4650139fb4043ab112d3f4716389d" command show following event.
> {code}
> Events:
>   Type     Reason              Age                    From                                             Message
>   ----     ------              ----                   ----                                             -------
>   Warning  FailedMount         3m20s (x65 over 148m)  kubelet, gke-sandbox-default-pool-36f8eec9-zcrf  Unable to mount volumes for pod "tutorialsleep-08a4650139fb4043ab112d3f4716389d_default(54db6d45-7cff-11e9-885f-42010a9200c8)": timeout expired waiting for volumes to attach or mount for pod "default"/"tutorialsleep-08a4650139fb4043ab112d3f4716389d". list of unmounted volumes=[airflow-dags airflow-logs]. list of unattached volumes=[airflow-dags airflow-logs airflow-config default-token-vrhsd]
>   Warning  FailedAttachVolume  7s (x77 over 150m)     attachdetach-controller                          AttachVolume.Attach failed for volume "pvc-9652f938-7cfe-11e9-885f-42010a9200c8" : googleapi: Error 400: RESOURCE_IN_USE_BY_ANOTHER_RESOURCE - The disk resource 'projects/shibata-dev-230102/zones/asia-northeast1-a/disks/gke-sandbox-1d5bc64e-d-pvc-9652f938-7cfe-11e9-885f-42010a9200c8' is already being used by 'projects/shibata-dev-230102/zones/asia-northeast1-a/instances/gke-sandbox-default-pool-36f8eec9-frv0'
> {code}
> "airflow" pod and "first task" pod are run on host "gke-sandbox-default-pool-36f8eec9-frv0". 
> "second task" pod are run on host "gke-sandbox-default-pool-36f8eec9-zcrf".
> I think host "gke-sandbox-default-pool-36f8eec9-zcrf" can't mount volumes because "gke-sandbox-default-pool-36f8eec9-frv0" already mount them.
> Access mode of PV and PVC is "ReadOnlyMany" at https://github.com/apache/airflow/blob/master/scripts/ci/kubernetes/kube/volumes.yaml#L26 .
> But I think some pod write data to "dags" and "logs" volumes, and it lock volume mount.
> One solution is changing access mode to "ReadWriteMany".
> But major volume plugin doesn't support "ReadWritemany" mode, for example AWSElasticBlockStore, GCEPersistentDisk.
> https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)