You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/24 06:17:17 UTC

[GitHub] [airflow] houqp opened a new pull request #13299: fix 422 invalid value error caused by long k8s pod name

houqp opened a new pull request #13299:
URL: https://github.com/apache/airflow/pull/13299


   K8S pod names follows DNS_SUBDOMAIN naming convention, which can be
   broken down into one or more DNS_LABEL separated by `.`.
   
   While the max length of pod name (DNS_SUBDOMAIN) is 253, each label
   component (DNS_LABEL) of a the name cannot be longer than 63. Pod names
   generated by k8s executor right now only contains one label, which means
   the total effective name length cannot be greater than 63.
   
   This patch concats uuid to pod_id using `.` to generate the pod anem,
   thus extending the max name length to 63 + len(uuid).
   
   Reference: https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/design/identifiers.md
   Relevant discussion: https://github.com/kubernetes/kubernetes/issues/79351#issuecomment-505228196


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #13299:
URL: https://github.com/apache/airflow/pull/13299


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] grepthat commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
grepthat commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-769848160


   @kaxil @houqp Looks good :+1: I checked this on a test DAG with a long task name (via nested Task Groups). Attached is the DAG as a reference:
   
   <details>
   <summary>process_long_taskname.py</summary>
   
   ```python
   from airflow import DAG
   from datetime import timedelta, datetime
   from airflow.operators.bash_operator import BashOperator
   from airflow.utils.task_group import TaskGroup
   
   dag = DAG(
       'process_long_task',
       default_args= {
           'owner': 'airflow',
           'depends_on_past': False,
           'retries' : 0,
           'start_date': datetime(1970, 1, 1),
           'retry_delay': timedelta(seconds=30),
       },
       description='',
       schedule_interval=None,
       catchup=False,
   )
   
   TG_survey00000 = TaskGroup(
       "TG_survey00000", 
       tooltip="", 
       dag=dag
   )
   
   TG_incremental_adjustment_survey00000_f608c63d9b = TaskGroup(
       "TG_incremental_adjustment_survey00000_f608c63d9b", 
       tooltip="", 
       parent_group=TG_survey00000,
       dag=dag
   )
   
   TG_msac_10_survey00000_1c1a34cf10 = TaskGroup(
       "TG_msac_10_survey00000_1c1a34cf10", 
       tooltip="", 
       parent_group=TG_incremental_adjustment_survey00000_f608c63d9b,
       dag=dag
   )
   
   TG_adjuster_786931747d = TaskGroup(
       "TG_bundle_adjuster_786931747d", 
       tooltip="", 
       parent_group=TG_msac_10_survey00000_1c1a34cf10,
       dag=dag
   )
   
   TG_color_0_521cd0b3f7 = TaskGroup(
       "TG_color_0_521cd0b3f7", 
       tooltip="", 
       parent_group=TG_adjuster_786931747d,
       dag=dag
   )
   
   T_finalize_5b57782bb2 = BashOperator(
       task_id='T_finalize_5b57782bb2',
       bash_command='echo "executing nested task && sleep 10"',
       dag=dag,
       task_group=TG_color_0_521cd0b3f7
   )
   ```
   
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r552420231



##########
File path: airflow/kubernetes/pod_generator.py
##########
@@ -370,6 +368,10 @@ def construct_pod(  # pylint: disable=too-many-arguments
         except Exception:  # pylint: disable=W0703
             image = kube_image
 
+        task_id = make_safe_label_value(task_id)
+        dag_id = make_safe_label_value(dag_id)
+        scheduler_job_id = make_safe_label_value(scheduler_job_id)

Review comment:
       scheduler_job_id is annotated as `scheduler_job_id: str,` in construct_pod's function signature as well as in `AirflowKubernetesScheduler` and many other class/function type signatures, do we need to correct the type annotation for all of them?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-756055414


   I don't quite understand the purpose of this change, specifically changing from `-` to `.` in the pod name. Is this just to get "more" of the name in the pod name?
   
   I wonder what that change would do to anyone upgrading from 2.0.0 to 2.0.1 with a running KubeExecutor pod -- would the old pod not be "watchable" anymore?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r552939598



##########
File path: airflow/kubernetes/pod_generator.py
##########
@@ -370,6 +368,10 @@ def construct_pod(  # pylint: disable=too-many-arguments
         except Exception:  # pylint: disable=W0703
             image = kube_image
 
+        task_id = make_safe_label_value(task_id)
+        dag_id = make_safe_label_value(dag_id)
+        scheduler_job_id = make_safe_label_value(scheduler_job_id)

Review comment:
       i do think it would be more consistent if we use int everywhere since that's the type we get from db.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751922396


   Thanks @potiuk for the tips. Based on your findings, I checked the images from my local breeze run.
   
   Here is part of the output from `breeze kind-cluster deploy`:
   
   ```
   Image: "apache/airflow:master-python3.6-kubernetes" with ID "sha256:486ae979be23a58012e5f08b2494fc1903c4cf70d127c837cb9933583130b0e6" not yet present on node "airflow-python-3.6-v1.18.6-worker", loading...
   Image: "apache/airflow:master-python3.6-kubernetes" with ID "sha256:486ae979be23a58012e5f08b2494fc1903c4cf70d127c837cb9933583130b0e6" not yet present on node "airflow-python-3.6-v1.18.6-control-plane", loading...
   ```
   
   Based on the above output, I checked the image with id `486ae979be23a58012e5f08b2494fc1903c4cf70d127c837cb9933583130b0e6`:
   
   ```
   18:12:27 ❯ docker images | grep master-python3.6-kubernetes
   apache/airflow                      master-python3.6-kubernetes              486ae979be23        7 minutes ago       845MB
   18:13:34 ❯ docker run --rm -it 486ae979be23a58012e5f08b2494fc1903c4cf70d127c837cb9933583130b0e6 bash
   airflow@bf3131f32f0d:/opt/airflow$ python
   Python 3.6.12 (default, Dec 11 2020, 15:17:39)
   [GCC 8.3.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> from airflow.executors.kubernetes_executor import MAX_POD_ID_LEN
   >>>
   airflow@bf3131f32f0d:/opt/airflow$ grep MAX_POD_ID_LEN  /home/airflow/.local/lib/python3.6/site-packages/airflow/kubernetes/pod_generator.py
   MAX_POD_ID_LEN = 253
           safe_pod_id = pod_id[: MAX_POD_ID_LEN - len(safe_uuid) - 1] + "-" + safe_uuid
   ```
   
   So this image indeed doesn't include my changes. So perhaps local breeze run is different from github action run? I will check the image produced by github action later today.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751416830


   [The Workflow run](https://github.com/apache/airflow/actions/runs/446332928) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-768630826


   @grepthat Can you also please take a look too :) thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751296069


   Byt YEAH.. looks like master has the same problems :(
   
   ============ 32 failed, 23 passed, 2 warnings in 474.24s (0:07:54) =============
   
   
   Something to fix in master then :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] grepthat commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
grepthat commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-760103070


   Jumping in on this since I've been looking into the long pod name problem as well 👋 
   
   Will `KubernetesExecutor::try_adopt_task_instances` still work? It will try to recreate a *pod_id* from the database as far as I understand, without making the dag or task id safe first, therefore yielding a wrong pod_id?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751296644


   Hopefully this one will fix it : https://github.com/apache/airflow/pull/13316


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] brandondtb commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
brandondtb commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r552799988



##########
File path: airflow/kubernetes/pod_generator.py
##########
@@ -370,6 +368,10 @@ def construct_pod(  # pylint: disable=too-many-arguments
         except Exception:  # pylint: disable=W0703
             image = kube_image
 
+        task_id = make_safe_label_value(task_id)
+        dag_id = make_safe_label_value(dag_id)
+        scheduler_job_id = make_safe_label_value(scheduler_job_id)

Review comment:
       I'll try to look into this a bit more today and see if I can get a better handle on why the type here is not the expected `str`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-761954447


   @grepthat yes, it will still work because it only uses `{'label_selector': f'airflow-worker={scheduler_job_id}'}` to filter pods.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-756106266


   yes, the dot is to help get more of the name in pod name. i will double check if it's a breaking change. from my test, the scheduler did drain old pods created with the old dash separated label name.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp edited a comment on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp edited a comment on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-755936064


   @potiuk I think i might have found the k8s test issue.
   
   I confirmed that the docker image built for the k8s test does contain my change:
   
   ```
   3:07:30 ❯ docker run -it docker.pkg.github.com/apache/airflow/master-python3.6:460204505 bash
   Unable to find image 'docker.pkg.github.com/apache/airflow/master-python3.6:460204505' locally
   460204505: Pulling from apache/airflow/master-python3.6
   852e50cd189d: Already exists
   334ed303e4ad: Already exists
   209dcde762c7: Already exists
   f9abeb9ca099: Already exists
   f750bc3b4393: Already exists
   6dfb08d9ecf4: Pull complete
   df1d95662566: Pull complete
   7b5270f850a8: Pull complete
   cd0c014ccf64: Pull complete
   972813881f3c: Pull complete
   6f19c4064553: Pull complete
   e4656094ab4e: Pull complete
   8524ff833442: Pull complete
   a1d0974a0892: Pull complete
   24c4330fe9a2: Pull complete
   ed0c8b4977e8: Pull complete
   Digest: sha256:02e5c5c6175529a988e95143483c20dd42c7754d9219d4bbae9b64d655cb821e
   Status: Downloaded newer image for docker.pkg.github.com/apache/airflow/master-python3.6:460204505
   airflow@56f39731592a:/opt/airflow$ python
   Python 3.6.12 (default, Nov 25 2020, 03:59:00)
   [GCC 8.3.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> from airflow.executors.kubernetes_executor import MAX_POD_ID_LEN
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   ImportError: cannot import name 'MAX_POD_ID_LEN'
   >>>
   airflow@56f39731592a:/opt/airflow$ grep MAX_POD_ID_LEN  /home/airflow/.local/lib/python3.6/site-packages/airflow/kubernetes/pod_generator.py
   airflow@56f39731592a:/opt/airflow$ exit
   ```
   
   Noticed the image id for this image is `23c1f6187642`:
   
   ```bash
   23:13:31 ❯ docker images |grep docker.pkg.github.com/apache/airflow/master-python3.6
   docker.pkg.github.com/apache/airflow/master-python3.6          460204505                                       23c1f6187642        3 days ago          845MB
   ```
   
   However, if you look into the debug log from k8s test, both web and scheduler pods are launched with a different image with the following id:
   
   ```
     Containers:
       scheduler:
         Container ID:  containerd://ed87933da7e8d8e2b0b79d14c859650c78d040d2f48849369db476ce8ec5a00a
         Image:         apache/airflow:master-python3.6-kubernetes
         Image ID:      sha256:3b65120f9e29723f3dc97b057231ee9cec8e9fd32eddc08897b2501ff90775dd
   ```
   
   Do you know where does that `apache/airflow:master-python3.6-kubernetes` image come from?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751395088


   Main issue has been solved, but I think the two failing errors need to be fixed in this PR @houqp :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751841821


   Hey QP. I tested the prod image used (the one that is used as a base for testing k8s ) and it seems that it contains your changes:
   
   ```
   docker run -it docker.pkg.github.com/apache/airflow/master-python3.6:448166283 bash
   airflow@9a15b4ba9935:/opt/airflow$ cd ~/.local/lib/python3.6/site-packages/airflow/executors/
   airflow@9a15b4ba9935:~/.local/lib/python3.6/site-packages/airflow/executors$ grep _make_safe_pod_id kubernetes_executor.py
   airflow@9a15b4ba9935:~/.local/lib/python3.6/site-packages/airflow/executors$ grep airflow.kubernetes.pod_generator  kubernetes_executor.py
   from airflow.kubernetes.pod_generator import PodGenerator
   airflow@9a15b4ba9935:~/.local/lib/python3.6/site-packages/airflow/executors$ python
   Python 3.6.12 (default, Nov 25 2020, 03:59:00) 
   [GCC 8.3.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> from airflow.executors.kubernetes_executor import PodGenerator
   >>> from airflow.executors.kubernetes_executor MAX_POD_ID_LEN
     File "<stdin>", line 1
       from airflow.executors.kubernetes_executor MAX_POD_ID_LEN
                                                               ^
   SyntaxError: invalid syntax
   >>> 
   ```
   
   Explanation: 
   All images that we use for testing are automatically stored in GitHub image registry with the RUN_ID tag:
   `docker.pkg.github.com/apache/airflow/master-python3.6:448166283` is the one that was used in case of the last run.
   
   I entered the image (via bash command) and run some greps and imports and seems that everything is as in your PR (removed MAX_POD_ID_LEN, removed _make_safe_pod_id)
   
   What happens next in the tests (you can see it in the logs) the image is tagged apache/airflow:master-python3.6:
   
   ```
   Tagging docker.pkg.github.com/apache/airflow/master-python3.6:448166283 as apache/airflow:master-python3.6
   ```
   
   And in the next step, the same image is used as a base for the K8S image and the image is loaded to the cluster:
   
   Building the K8S image (basically embedding pod templates and example dags in the image):
   
   ```
   Step 1/5 : FROM apache/airflow:master-python3.6
    ---> d83522e8e04d
   Step 2/5 : USER root
    ---> Running in 4c6ab5c14968
   Removing intermediate container 4c6ab5c14968
    ---> ab350a1cf73b
   Step 3/5 : COPY --chown=airflow:root airflow/example_dags/ ${AIRFLOW_HOME}/dags/
    ---> 5bb3d159f5f9
   Step 4/5 : COPY --chown=airflow:root airflow/kubernetes_executor_templates/ ${AIRFLOW_HOME}/pod_templates/
    ---> 12adf141ca1e
   Step 5/5 : USER airflow
    ---> Running in fa63e6afc5b4
   Removing intermediate container fa63e6afc5b4
    ---> 12e5286b9925
   Successfully built 12e5286b9925
   Successfully tagged apache/airflow:master-python3.6-kubernetes
   The apache/airflow:master-python3.6-kubernetes is prepared for test kubernetes deployment.
   ```
   
   And then this image is loaded to the K8S cluster:
   
   ```
   Loading apache/airflow:master-python3.6-kubernetes to airflow-python-3.6-v1.18.6
   ```
   
   So all there seems totally legit.
   
   Can you please rebase once again on top of the master? I improved the logging even more in recent versions and hopefully it will be even clearer what's going on here.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp edited a comment on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp edited a comment on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751437194


   @potiuk looks like the kind cluster is not picking up the code change in my branch. I am able to reproduce this locally with breeze and the task pod names are created with `-` instead of `.` as uuid prefix. I looked into the scheduler pod config, it's useing `apache/airflow:master-python3.6-kubernetes` as the container image. I also double checked the `/home/airflow/.local/lib/python3.6/site-packages/airflow/kubernetes/pod_generator.py` file in scheduler pod and it indeed doesn't have my change.
   
   For any executor code change, what other changes do I need to make in order to get the kind cluster pick up my code?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] grepthat edited a comment on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
grepthat edited a comment on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-760103070


   Jumping in on this since I've been looking into the long pod name problem as well 👋 
   
   Will `KubernetesExecutor::try_adopt_task_instances` still work? It will try to recreate a *pod_id* from the database entries as far as I understand, without making the dag or task id safe first, therefore yielding a wrong pod_id?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-768624490


   @dimberman Can you take a look at this one


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-755939185


   OK, i checked the image built from the Github Action pipeline. It does contain my changes. So looks like the behavior is different between local breeze run v.s. github action run. @potiuk is there a way for me to reproduce this from my local machine?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-750799759


   [The Workflow run](https://github.com/apache/airflow/actions/runs/441952081) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-763248359


   [The Workflow run](https://github.com/apache/airflow/actions/runs/497373233) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] grepthat commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
grepthat commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-762058706


   @houqp In `adopt_launched_task` a pod id is reconstructed from the 'dag_id' and 'task_id' labels of the pod (that were made safe previously) and then checked if that reconstructed pod id is in the `pod_ids` list.
   https://github.com/apache/airflow/blob/1ec63123c4310f2343dcd7c349f90063c401c0d9/airflow/executors/kubernetes_executor.py#L628-L631
   
   The `pod_ids` list passed to `adopt_launched_task` is a list of pod ids reconstructed from the **raw** 'dag_id' and 'task_id' (without making the IDs safe beforehand)
   https://github.com/apache/airflow/blob/1ec63123c4310f2343dcd7c349f90063c401c0d9/airflow/executors/kubernetes_executor.py#L605-L607
   
   Will this not yield problems?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-755936064


   @potiuk I think i might have found the k8s test issue.
   
   I confirmed that the docker image built for the k8s test does contain the right code:
   
   ```
   3:07:30 ❯ docker run -it docker.pkg.github.com/apache/airflow/master-python3.6:460204505 bash
   Unable to find image 'docker.pkg.github.com/apache/airflow/master-python3.6:460204505' locally
   460204505: Pulling from apache/airflow/master-python3.6
   852e50cd189d: Already exists
   334ed303e4ad: Already exists
   209dcde762c7: Already exists
   f9abeb9ca099: Already exists
   f750bc3b4393: Already exists
   6dfb08d9ecf4: Pull complete
   df1d95662566: Pull complete
   7b5270f850a8: Pull complete
   cd0c014ccf64: Pull complete
   972813881f3c: Pull complete
   6f19c4064553: Pull complete
   e4656094ab4e: Pull complete
   8524ff833442: Pull complete
   a1d0974a0892: Pull complete
   24c4330fe9a2: Pull complete
   ed0c8b4977e8: Pull complete
   Digest: sha256:02e5c5c6175529a988e95143483c20dd42c7754d9219d4bbae9b64d655cb821e
   Status: Downloaded newer image for docker.pkg.github.com/apache/airflow/master-python3.6:460204505
   airflow@56f39731592a:/opt/airflow$ python
   Python 3.6.12 (default, Nov 25 2020, 03:59:00)
   [GCC 8.3.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> from airflow.executors.kubernetes_executor import MAX_POD_ID_LEN
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   ImportError: cannot import name 'MAX_POD_ID_LEN'
   >>>
   airflow@56f39731592a:/opt/airflow$ grep MAX_POD_ID_LEN  /home/airflow/.local/lib/python3.6/site-packages/airflow/kubernetes/pod_generator.py
   airflow@56f39731592a:/opt/airflow$ exit
   ```
   
   Noticed the image id from this image is `23c1f6187642`:
   
   ```bash
   23:13:31 ❯ docker images |grep docker.pkg.github.com/apache/airflow/master-python3.6
   docker.pkg.github.com/apache/airflow/master-python3.6          460204505                                       23c1f6187642        3 days ago          845MB
   ```
   
   However, if you look into the debug log from k8s test, both web and scheduler pods are launched with a different image with the following id:
   
   ```
     Containers:
       scheduler:
         Container ID:  containerd://ed87933da7e8d8e2b0b79d14c859650c78d040d2f48849369db476ce8ec5a00a
         Image:         apache/airflow:master-python3.6-kubernetes
         Image ID:      sha256:3b65120f9e29723f3dc97b057231ee9cec8e9fd32eddc08897b2501ff90775dd
   ```
   
   Do you know where does that `apache/airflow:master-python3.6-kubernetes` image come from?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] brandondtb commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
brandondtb commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r552377212



##########
File path: airflow/kubernetes/pod_generator.py
##########
@@ -370,6 +368,10 @@ def construct_pod(  # pylint: disable=too-many-arguments
         except Exception:  # pylint: disable=W0703
             image = kube_image
 
+        task_id = make_safe_label_value(task_id)
+        dag_id = make_safe_label_value(dag_id)
+        scheduler_job_id = make_safe_label_value(scheduler_job_id)

Review comment:
       @houqp `scheduler_job_id` needs to be converted to a string before passing it to `make_safe_label_value`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-763329337


   @grepthat @ashb @kaxil @dimberman @brandondtb pushed a commit to add more label sanitizing, ready for another round of review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-756537283


   @dimberman would appreciate your review on this as well :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751415939


   yes, this should fix #13189 @grepthat. thanks @potiuk for the quick fix, let me dig into what's going on with the integration test. It's odd that we have been running with this patch for couple days without any issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751437194


   @potiuk looks like the kind cluster is not picking up the code change in my branch. I am able to reproduce this locally with breeze and the pod names are created with `-` instead of `.` as uuid prefix. I looked into the scheduler pod config, it's useing `apache/airflow:master-python3.6-kubernetes` as the container image. I also double checked the `/home/airflow/.local/lib/python3.6/site-packages/airflow/kubernetes/pod_generator.py` file in scheduler pod and it indeed doesn't have my change.
   
   For any executor code change, what other changes do I need to make in order to get the kind cluster pick up my code?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] brandondtb commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
brandondtb commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r552377212



##########
File path: airflow/kubernetes/pod_generator.py
##########
@@ -370,6 +368,10 @@ def construct_pod(  # pylint: disable=too-many-arguments
         except Exception:  # pylint: disable=W0703
             image = kube_image
 
+        task_id = make_safe_label_value(task_id)
+        dag_id = make_safe_label_value(dag_id)
+        scheduler_job_id = make_safe_label_value(scheduler_job_id)

Review comment:
       @houqp `scheduler_job_id` needs to be converted to a string.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-764143012


   Ping @dimberman 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-768640574


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-752302058


   [The Workflow run](https://github.com/apache/airflow/actions/runs/451836074) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r553262823



##########
File path: tests/kubernetes/test_pod_generator.py
##########
@@ -477,14 +478,36 @@ def test_construct_pod_empty_executor_config(self, mock_uuid):
         worker_config.metadata.annotations = self.annotations
         worker_config.metadata.labels = self.labels
         worker_config.metadata.labels['app'] = 'myapp'
-        worker_config.metadata.name = 'pod_id-' + self.static_uuid.hex
+        worker_config.metadata.name = 'pod_id.' + self.static_uuid.hex
         worker_config.metadata.namespace = 'namespace'
         worker_config.spec.containers[0].env.append(
             k8s.V1EnvVar(name="AIRFLOW_IS_K8S_EXECUTOR_POD", value='True')
         )
         worker_config_result = self.k8s_client.sanitize_for_serialization(worker_config)
         self.assertEqual(worker_config_result, sanitized_result)
 
+    def ensure_max_label_length(self):
+        path = sys.path[0] + '/tests/kubernetes/pod_generator_base_with_secrets.yaml'

Review comment:
       ```suggestion
           path = os.path.join(os.path.dirname(__file__), 'pod_generator_base_with_secrets.yaml')
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp removed a comment on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp removed a comment on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751876192


   One difference I noticed between github action and local breeze is in github action run, scheduler pod is failing health check and has the following log:
   
   ```
     [2020-12-28 05:10:13,215] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 35
     [2020-12-28 05:10:13,740] {settings.py:290} DEBUG - Disposing DB connection pool (PID 7057)
     [2020-12-28 05:10:13,752] {settings.py:290} DEBUG - Disposing DB connection pool (PID 7064)
     [2020-12-28 05:10:13,756] {process_utils.py:201} INFO - Waiting up to 5 seconds for processes to exit...
     [2020-12-28 05:10:13,762] {process_utils.py:201} INFO - Waiting up to 5 seconds for processes to exit...
     [2020-12-28 05:10:13,768] {process_utils.py:61} INFO - Process psutil.Process(pid=7057, status='terminated', started='05:10:12') (7057) terminated with exit code None
     [2020-12-28 05:10:13,769] {process_utils.py:61} INFO - Process psutil.Process(pid=7064, status='terminated', started='05:10:12') (7064) terminated with exit code None
     [2020-12-28 05:10:13,769] {process_utils.py:61} INFO - Process psutil.Process(pid=35, status='terminated', exitcode=0, started='05:07:55') (35) terminated with exit code 0
     [2020-12-28 05:10:13,770] {scheduler_job.py:1297} INFO - Exited execute loop
     [2020-12-28 05:10:13,775] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
   ```
   
   But I am not able to reproduce this in local kubernetes test run at all. However, both local and remote run all failed on the same set of tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r548404758



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -367,24 +367,6 @@ def _annotations_to_key(self, annotations: Dict[str, str]) -> Optional[TaskInsta
 
         return TaskInstanceKey(dag_id, task_id, execution_date, try_number)
 
-    @staticmethod
-    def _make_safe_pod_id(safe_dag_id: str, safe_task_id: str, safe_uuid: str) -> str:

Review comment:
       removing since this method is not being used anywhere.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r549233503



##########
File path: kubernetes_tests/test_kubernetes_executor.py
##########
@@ -55,6 +55,14 @@ def _describe_resources(namespace: str):
         subprocess.call(["kubectl", "describe", "pvc", "--namespace", namespace])
         print("=" * 80)
 
+    @staticmethod
+    def _show_scheduler_log():
+        print(">" * 80)
+        subprocess.call(
+            ["kubectl", "logs", "-l", "component=scheduler", "-c", "scheduler", "--namespace", "airflow"]

Review comment:
       Is this namespace correct in all cases? I think the user can change this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751294439


   Not THAT flaky I think . This looks like legit problem


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r553262286



##########
File path: kubernetes_tests/test_kubernetes_executor.py
##########
@@ -122,6 +156,9 @@ def monitor_task(self, host, execution_date, dag_id, task_id, expected_final_sta
                     break
                 self._describe_resources(namespace="airflow")
                 self._describe_resources(namespace="default")
+                self._show_scheduler_log()

Review comment:
       Left over debugging?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751451490


   This should work out-of-the-box @houqp. There was a recent change though where the production images are built from packages rather than directly from sources. But the packages are locally prepared using the PR sources, so it should - in-principle - work fine. But I will take a look.
   
   Kind request: rhis change https://github.com/apache/airflow/pull/13323 should vastly help in being able to analyse it much faster. It introduces grouping of the logs so that it will be much easier to analyse any problems. 
   
   If you can take a look and we merge this one and you rebase your change on top, that would be much easier to analyse it :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751974078


   [The Workflow run](https://github.com/apache/airflow/actions/runs/450269034) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-762459414


   @grepthat i see what you meant. yes, that pod_ids dict construction code will need to be changed to use label safe values as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r553265547



##########
File path: tests/kubernetes/test_pod_generator.py
##########
@@ -477,14 +478,36 @@ def test_construct_pod_empty_executor_config(self, mock_uuid):
         worker_config.metadata.annotations = self.annotations
         worker_config.metadata.labels = self.labels
         worker_config.metadata.labels['app'] = 'myapp'
-        worker_config.metadata.name = 'pod_id-' + self.static_uuid.hex
+        worker_config.metadata.name = 'pod_id.' + self.static_uuid.hex
         worker_config.metadata.namespace = 'namespace'
         worker_config.spec.containers[0].env.append(
             k8s.V1EnvVar(name="AIRFLOW_IS_K8S_EXECUTOR_POD", value='True')
         )
         worker_config_result = self.k8s_client.sanitize_for_serialization(worker_config)
         self.assertEqual(worker_config_result, sanitized_result)
 
+    def ensure_max_label_length(self):
+        path = sys.path[0] + '/tests/kubernetes/pod_generator_base_with_secrets.yaml'
+        worker_config = PodGenerator.deserialize_model_file(path)
+
+        result = PodGenerator.construct_pod(
+            dag_id='a' * 512,
+            task_id='a' * 512,
+            pod_id='a' * 512,
+            kube_image='a' * 512,
+            try_number=3,
+            date=self.execution_date,
+            args=['command'],
+            namespace='namespace',
+            scheduler_job_id='a' * 512,
+            pod_override_object=None,
+            base_worker_pod=worker_config,
+        )
+
+        assert result.metadata.name < 253

Review comment:
       Could we also add a specific check here (`assert result.metadata.name == 'aaaaaa...'`), like we do elsewhere in the file with
   
   ```
       @mock.patch('uuid.uuid4')
       def test_gen_pod_extract_xcom(self, mock_uuid):
           mock_uuid.return_value = self.static_uuid
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r553314777



##########
File path: kubernetes_tests/test_kubernetes_executor.py
##########
@@ -122,6 +156,9 @@ def monitor_task(self, host, execution_date, dag_id, task_id, expected_final_sta
                     break
                 self._describe_resources(namespace="airflow")
                 self._describe_resources(namespace="default")
+                self._show_scheduler_log()

Review comment:
       still in debugging state :P




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751876192


   One difference I noticed between github action and local breeze is in github action run, scheduler pod is failing health check and has the following log:
   
   ```
     [2020-12-28 05:10:13,215] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 35
     [2020-12-28 05:10:13,740] {settings.py:290} DEBUG - Disposing DB connection pool (PID 7057)
     [2020-12-28 05:10:13,752] {settings.py:290} DEBUG - Disposing DB connection pool (PID 7064)
     [2020-12-28 05:10:13,756] {process_utils.py:201} INFO - Waiting up to 5 seconds for processes to exit...
     [2020-12-28 05:10:13,762] {process_utils.py:201} INFO - Waiting up to 5 seconds for processes to exit...
     [2020-12-28 05:10:13,768] {process_utils.py:61} INFO - Process psutil.Process(pid=7057, status='terminated', started='05:10:12') (7057) terminated with exit code None
     [2020-12-28 05:10:13,769] {process_utils.py:61} INFO - Process psutil.Process(pid=7064, status='terminated', started='05:10:12') (7064) terminated with exit code None
     [2020-12-28 05:10:13,769] {process_utils.py:61} INFO - Process psutil.Process(pid=35, status='terminated', exitcode=0, started='05:07:55') (35) terminated with exit code 0
     [2020-12-28 05:10:13,770] {scheduler_job.py:1297} INFO - Exited execute loop
     [2020-12-28 05:10:13,775] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
   ```
   
   But I am not able to reproduce this in local kubernetes test run at all. However, both local and remote run all failed on the same set of tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-756482631


   @ashb double checked the watcher code, it's using schedule job id label as the filter: `kwargs = {'label_selector': f'airflow-worker={scheduler_job_id}'}`, so changing pod name should have no impact to the watcher.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r553265547



##########
File path: tests/kubernetes/test_pod_generator.py
##########
@@ -477,14 +478,36 @@ def test_construct_pod_empty_executor_config(self, mock_uuid):
         worker_config.metadata.annotations = self.annotations
         worker_config.metadata.labels = self.labels
         worker_config.metadata.labels['app'] = 'myapp'
-        worker_config.metadata.name = 'pod_id-' + self.static_uuid.hex
+        worker_config.metadata.name = 'pod_id.' + self.static_uuid.hex
         worker_config.metadata.namespace = 'namespace'
         worker_config.spec.containers[0].env.append(
             k8s.V1EnvVar(name="AIRFLOW_IS_K8S_EXECUTOR_POD", value='True')
         )
         worker_config_result = self.k8s_client.sanitize_for_serialization(worker_config)
         self.assertEqual(worker_config_result, sanitized_result)
 
+    def ensure_max_label_length(self):
+        path = sys.path[0] + '/tests/kubernetes/pod_generator_base_with_secrets.yaml'
+        worker_config = PodGenerator.deserialize_model_file(path)
+
+        result = PodGenerator.construct_pod(
+            dag_id='a' * 512,
+            task_id='a' * 512,
+            pod_id='a' * 512,
+            kube_image='a' * 512,
+            try_number=3,
+            date=self.execution_date,
+            args=['command'],
+            namespace='namespace',
+            scheduler_job_id='a' * 512,
+            pod_override_object=None,
+            base_worker_pod=worker_config,
+        )
+
+        assert result.metadata.name < 253

Review comment:
       Could we also add a specific check here, like we do elsxewhere in the file with
   
   ```
       @mock.patch('uuid.uuid4')
       def test_gen_pod_extract_xcom(self, mock_uuid):
           mock_uuid.return_value = self.static_uuid
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r553732605



##########
File path: airflow/kubernetes/pod_generator.py
##########
@@ -370,6 +368,10 @@ def construct_pod(  # pylint: disable=too-many-arguments
         except Exception:  # pylint: disable=W0703
             image = kube_image
 
+        task_id = make_safe_label_value(task_id)
+        dag_id = make_safe_label_value(dag_id)
+        scheduler_job_id = make_safe_label_value(scheduler_job_id)

Review comment:
       I have added the `str` conversion, perhaps we should address the type annotation fix in a separate followup PR




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp removed a comment on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp removed a comment on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-755936064


   @potiuk I think i might have found the k8s test issue.
   
   I confirmed that the docker image built for the k8s test does contain my change:
   
   ```
   3:07:30 ❯ docker run -it docker.pkg.github.com/apache/airflow/master-python3.6:460204505 bash
   Unable to find image 'docker.pkg.github.com/apache/airflow/master-python3.6:460204505' locally
   460204505: Pulling from apache/airflow/master-python3.6
   852e50cd189d: Already exists
   334ed303e4ad: Already exists
   209dcde762c7: Already exists
   f9abeb9ca099: Already exists
   f750bc3b4393: Already exists
   6dfb08d9ecf4: Pull complete
   df1d95662566: Pull complete
   7b5270f850a8: Pull complete
   cd0c014ccf64: Pull complete
   972813881f3c: Pull complete
   6f19c4064553: Pull complete
   e4656094ab4e: Pull complete
   8524ff833442: Pull complete
   a1d0974a0892: Pull complete
   24c4330fe9a2: Pull complete
   ed0c8b4977e8: Pull complete
   Digest: sha256:02e5c5c6175529a988e95143483c20dd42c7754d9219d4bbae9b64d655cb821e
   Status: Downloaded newer image for docker.pkg.github.com/apache/airflow/master-python3.6:460204505
   airflow@56f39731592a:/opt/airflow$ python
   Python 3.6.12 (default, Nov 25 2020, 03:59:00)
   [GCC 8.3.0] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> from airflow.executors.kubernetes_executor import MAX_POD_ID_LEN
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   ImportError: cannot import name 'MAX_POD_ID_LEN'
   >>>
   airflow@56f39731592a:/opt/airflow$ grep MAX_POD_ID_LEN  /home/airflow/.local/lib/python3.6/site-packages/airflow/kubernetes/pod_generator.py
   airflow@56f39731592a:/opt/airflow$ exit
   ```
   
   Noticed the image id for this image is `23c1f6187642`:
   
   ```bash
   23:13:31 ❯ docker images |grep docker.pkg.github.com/apache/airflow/master-python3.6
   docker.pkg.github.com/apache/airflow/master-python3.6          460204505                                       23c1f6187642        3 days ago          845MB
   ```
   
   However, if you look into the debug log from k8s test, both web and scheduler pods are launched with a different image with the following id:
   
   ```
     Containers:
       scheduler:
         Container ID:  containerd://ed87933da7e8d8e2b0b79d14c859650c78d040d2f48849369db476ce8ec5a00a
         Image:         apache/airflow:master-python3.6-kubernetes
         Image ID:      sha256:3b65120f9e29723f3dc97b057231ee9cec8e9fd32eddc08897b2501ff90775dd
   ```
   
   Do you know where does that `apache/airflow:master-python3.6-kubernetes` image come from?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r553732605



##########
File path: airflow/kubernetes/pod_generator.py
##########
@@ -370,6 +368,10 @@ def construct_pod(  # pylint: disable=too-many-arguments
         except Exception:  # pylint: disable=W0703
             image = kube_image
 
+        task_id = make_safe_label_value(task_id)
+        dag_id = make_safe_label_value(dag_id)
+        scheduler_job_id = make_safe_label_value(scheduler_job_id)

Review comment:
       I have added the `str` conversion, perhaps we should address the type annotation fix in a separate followup PR if it's not going to be trivial to figure out.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751294094


   Is the k8s image test supposed to be flaky? I am seeing failures from it in my other unrelated PRs as well as master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-764143012


   Ping @dimberman 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-752697529


   Hmm. Temporary GitHub problem again @houqp :( one more rebase needed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp edited a comment on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp edited a comment on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751437194


   @potiuk looks like the kind cluster is not picking up the code change in my branch. I am able to reproduce this locally with breeze and the task pod names are created with `-` instead of `.` as uuid prefix. I looked into the scheduler pod config, it's using `apache/airflow:master-python3.6-kubernetes` as the container image. I also double checked the `/home/airflow/.local/lib/python3.6/site-packages/airflow/kubernetes/pod_generator.py` file in scheduler pod and it indeed doesn't have my change.
   
   For any executor code change, what other changes do I need to make in order to get the kind cluster pick up my code?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#discussion_r552939598



##########
File path: airflow/kubernetes/pod_generator.py
##########
@@ -370,6 +368,10 @@ def construct_pod(  # pylint: disable=too-many-arguments
         except Exception:  # pylint: disable=W0703
             image = kube_image
 
+        task_id = make_safe_label_value(task_id)
+        dag_id = make_safe_label_value(dag_id)
+        scheduler_job_id = make_safe_label_value(scheduler_job_id)

Review comment:
       i do think it would be more consistent if we use int everywhere since that's the type we get from db. but i haven't had the time to check if it's been converted to str already somewhere upstream.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751304681


   Should be fixed with latest fix @houqp :). Can you please rebase and check?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] grepthat commented on pull request #13299: fix 422 invalid value error caused by long k8s pod name

Posted by GitBox <gi...@apache.org>.
grepthat commented on pull request #13299:
URL: https://github.com/apache/airflow/pull/13299#issuecomment-751343456


   Nice, this probably closes https://github.com/apache/airflow/issues/13189


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org