You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Greg Ferrar (JIRA)" <ji...@apache.org> on 2019/07/18 18:06:00 UTC

[jira] [Comment Edited] (AIRFLOW-4991) 401 error when EKS task runs longer than 15 minutes

    [ https://issues.apache.org/jira/browse/AIRFLOW-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888218#comment-16888218 ] 

Greg Ferrar edited comment on AIRFLOW-4991 at 7/18/19 6:05 PM:
---------------------------------------------------------------

Making this hack to pod_laucher.py, fixes this:
{code:java}
def read_pod(self, pod):
    print("###GMF in read_pod()")

    #GMF Debug
    print("###GMF Attempting to reload kube client to re-authenticate, by calling get_kube_client")
    self._client = get_kube_client(False, None, '/home/ubuntu/.kube/kubeconfig_gmf-telemetry')
    print("###GMF DONE calling get_kube_client()")

    try:
        return self._client.read_namespaced_pod(pod.name, pod.namespace)
    except BaseHTTPError as e:
        print("###GMF Exception in read_pod()")

        raise AirflowException(
            'There was an error reading the kubernetes API: {}'.format(e)
        )
{code}
This does not seem like the best solution to me. What I'm doing above is recreating the kube client, which indirectly calls `_load_authentication()`. I don't see an immediate pretty way to just reload the authentication.

Also, this reloads the client (and reauthenticates to EKS) *every* time `read_pod()` is call, which I suspect is extreme overkill, and might even break some other non-EKS system.

A better approach might be to call `_load_authentication()` (maybe through a new public method `reload_authentication()`, only when there is an exception containing "401", and rely on tenacity to rerun `read_pod()` after the re-authentication.


was (Author: greg-ferrar):
Making this hack to pod_laucher.py, fixes this:
{code:java}
def read_pod(self, pod):
    print("###GMF in read_pod()")

    #GMF Debug
    print("###GMF Attempting to reload kube client to re-authenticate, by calling get_kube_client")
    self._client = get_kube_client(False, None, '/home/ubuntu/.kube/kubeconfig_gmf-telemetry')
    print("###GMF DONE calling get_kube_client()")

    try:
        return self._client.read_namespaced_pod(pod.name, pod.namespace)
    except BaseHTTPError as e:
    print("###GMF Exception in read_pod()")

    raise AirflowException(
        'There was an error reading the kubernetes API: {}'.format(e)
    )
{code}
This does not seem like the best solution to me. What I'm doing above is recreating the kube client, which indirectly calls `_load_authentication()`. I don't see an immediate pretty way to just reload the authentication.

Also, this reloads the client (and reauthenticates to EKS) *every* time `read_pod()` is call, which I suspect is extreme overkill, and might even break some other non-EKS system.

A better approach might be to call `_load_authentication()` (maybe through a new public method `reload_authentication()`, only when there is an exception containing "401", and rely on tenacity to rerun `read_pod()` after the re-authentication.

> 401 error when EKS task runs longer than 15 minutes
> ---------------------------------------------------
>
>                 Key: AIRFLOW-4991
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4991
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: authentication
>    Affects Versions: 1.10.3
>            Reporter: Greg Ferrar
>            Priority: Minor
>
> Using KubernetesOperator with EKS, tasks that run more than 15 minutes result in a `401 Unauthenticated` error after the worker script successfully completes.
> This is due to EKS having a 15-minute timeout on its authentication token.
> Solution is to re-authenticate with EKS at least every fifteen minutes, or maybe just at the end of the job.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)