You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "shibataka000 (JIRA)" <ji...@apache.org> on 2019/05/23 04:59:00 UTC
[jira] [Updated] (AIRFLOW-4561) Pod show "FailedAttachVolume
warning" when KubernetesExecutor used
[ https://issues.apache.org/jira/browse/AIRFLOW-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
shibataka000 updated AIRFLOW-4561:
----------------------------------
Description:
I construct Airflow on GKE according to https://github.com/apache/airflow/tree/master/scripts/ci/kubernetes .
And I run tutorial dag ( https://airflow.apache.org/tutorial.html ) some times.
First pod finish successfully. But second pod doesn't work.
"kubectl describe pod tutorialsleep-08a4650139fb4043ab112d3f4716389d" command show following event.
{code}
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 3m20s (x65 over 148m) kubelet, gke-sandbox-default-pool-36f8eec9-zcrf Unable to mount volumes for pod "tutorialsleep-08a4650139fb4043ab112d3f4716389d_default(54db6d45-7cff-11e9-885f-42010a9200c8)": timeout expired waiting for volumes to attach or mount for pod "default"/"tutorialsleep-08a4650139fb4043ab112d3f4716389d". list of unmounted volumes=[airflow-dags airflow-logs]. list of unattached volumes=[airflow-dags airflow-logs airflow-config default-token-vrhsd]
Warning FailedAttachVolume 7s (x77 over 150m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-9652f938-7cfe-11e9-885f-42010a9200c8" : googleapi: Error 400: RESOURCE_IN_USE_BY_ANOTHER_RESOURCE - The disk resource 'projects/shibata-dev-230102/zones/asia-northeast1-a/disks/gke-sandbox-1d5bc64e-d-pvc-9652f938-7cfe-11e9-885f-42010a9200c8' is already being used by 'projects/shibata-dev-230102/zones/asia-northeast1-a/instances/gke-sandbox-default-pool-36f8eec9-frv0'
{code}
"airflow" pod and "first task" pod are run on host "gke-sandbox-default-pool-36f8eec9-frv0".
"second task" pod are run on host "gke-sandbox-default-pool-36f8eec9-zcrf".
I think host "gke-sandbox-default-pool-36f8eec9-zcrf" can't mount volumes because "gke-sandbox-default-pool-36f8eec9-frv0" already mount them.
Access mode of PV and PVC is "ReadOnlyMany" at https://github.com/apache/airflow/blob/master/scripts/ci/kubernetes/kube/volumes.yaml#L26 .
But I think some pod write data to "dags" and "logs" volumes, and it lock volume mount.
One solution is changing access mode to "ReadWriteMany".
But major volume plugin doesn't support "ReadWritemany" mode, for example AWSElasticBlockStore, GCEPersistentDisk.
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
was:
I construct Airflow on CKE according to https://github.com/apache/airflow/tree/master/scripts/ci/kubernetes .
And I run tutorial dag ( https://airflow.apache.org/tutorial.html ) some times.
First pod finish successfully. But second pod doesn't work.
"kubectl describe pod tutorialsleep-08a4650139fb4043ab112d3f4716389d" command show following event.
{code}
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 3m20s (x65 over 148m) kubelet, gke-sandbox-default-pool-36f8eec9-zcrf Unable to mount volumes for pod "tutorialsleep-08a4650139fb4043ab112d3f4716389d_default(54db6d45-7cff-11e9-885f-42010a9200c8)": timeout expired waiting for volumes to attach or mount for pod "default"/"tutorialsleep-08a4650139fb4043ab112d3f4716389d". list of unmounted volumes=[airflow-dags airflow-logs]. list of unattached volumes=[airflow-dags airflow-logs airflow-config default-token-vrhsd]
Warning FailedAttachVolume 7s (x77 over 150m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-9652f938-7cfe-11e9-885f-42010a9200c8" : googleapi: Error 400: RESOURCE_IN_USE_BY_ANOTHER_RESOURCE - The disk resource 'projects/shibata-dev-230102/zones/asia-northeast1-a/disks/gke-sandbox-1d5bc64e-d-pvc-9652f938-7cfe-11e9-885f-42010a9200c8' is already being used by 'projects/shibata-dev-230102/zones/asia-northeast1-a/instances/gke-sandbox-default-pool-36f8eec9-frv0'
{code}
"airflow" pod and "first task" pod are run on host "gke-sandbox-default-pool-36f8eec9-frv0".
"second task" pod are run on host "gke-sandbox-default-pool-36f8eec9-zcrf".
I think host "gke-sandbox-default-pool-36f8eec9-zcrf" can't mount volumes because "gke-sandbox-default-pool-36f8eec9-frv0" already mount them.
Access mode of PV and PVC is "ReadOnlyMany" at https://github.com/apache/airflow/blob/master/scripts/ci/kubernetes/kube/volumes.yaml#L26 .
But I think some pod write data to "dags" and "logs" volumes, and it lock volume mount.
One solution is changing access mode to "ReadWriteMany".
But major volume plugin doesn't support "ReadWritemany" mode, for example AWSElasticBlockStore, GCEPersistentDisk.
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
> Pod show "FailedAttachVolume warning" when KubernetesExecutor used
> ------------------------------------------------------------------
>
> Key: AIRFLOW-4561
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4561
> Project: Apache Airflow
> Issue Type: Bug
> Components: executor
> Affects Versions: 2.0.0
> Environment: GKE
> Reporter: shibataka000
> Priority: Major
> Attachments: kubectl_command_result.txt
>
>
> I construct Airflow on GKE according to https://github.com/apache/airflow/tree/master/scripts/ci/kubernetes .
> And I run tutorial dag ( https://airflow.apache.org/tutorial.html ) some times.
> First pod finish successfully. But second pod doesn't work.
> "kubectl describe pod tutorialsleep-08a4650139fb4043ab112d3f4716389d" command show following event.
> {code}
> Events:
> Type Reason Age From Message
> ---- ------ ---- ---- -------
> Warning FailedMount 3m20s (x65 over 148m) kubelet, gke-sandbox-default-pool-36f8eec9-zcrf Unable to mount volumes for pod "tutorialsleep-08a4650139fb4043ab112d3f4716389d_default(54db6d45-7cff-11e9-885f-42010a9200c8)": timeout expired waiting for volumes to attach or mount for pod "default"/"tutorialsleep-08a4650139fb4043ab112d3f4716389d". list of unmounted volumes=[airflow-dags airflow-logs]. list of unattached volumes=[airflow-dags airflow-logs airflow-config default-token-vrhsd]
> Warning FailedAttachVolume 7s (x77 over 150m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-9652f938-7cfe-11e9-885f-42010a9200c8" : googleapi: Error 400: RESOURCE_IN_USE_BY_ANOTHER_RESOURCE - The disk resource 'projects/shibata-dev-230102/zones/asia-northeast1-a/disks/gke-sandbox-1d5bc64e-d-pvc-9652f938-7cfe-11e9-885f-42010a9200c8' is already being used by 'projects/shibata-dev-230102/zones/asia-northeast1-a/instances/gke-sandbox-default-pool-36f8eec9-frv0'
> {code}
> "airflow" pod and "first task" pod are run on host "gke-sandbox-default-pool-36f8eec9-frv0".
> "second task" pod are run on host "gke-sandbox-default-pool-36f8eec9-zcrf".
> I think host "gke-sandbox-default-pool-36f8eec9-zcrf" can't mount volumes because "gke-sandbox-default-pool-36f8eec9-frv0" already mount them.
> Access mode of PV and PVC is "ReadOnlyMany" at https://github.com/apache/airflow/blob/master/scripts/ci/kubernetes/kube/volumes.yaml#L26 .
> But I think some pod write data to "dags" and "logs" volumes, and it lock volume mount.
> One solution is changing access mode to "ReadWriteMany".
> But major volume plugin doesn't support "ReadWritemany" mode, for example AWSElasticBlockStore, GCEPersistentDisk.
> https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)