You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/01 13:26:58 UTC

[GitHub] [airflow] hterik opened a new issue, #26101: Kubernetes Invalid executor_config, pod_override filled with Encoding.VAR

hterik opened a new issue, #26101:
URL: https://github.com/apache/airflow/issues/26101

   ### Apache Airflow version
   
   2.3.4
   
   ### What happened
   
   Trying to start Kubernetes tasks using a `pod_override` results in pods not starting after upgrading from 2.3.2 to 2.3.4
   
   The pod_override look very odd, filled with many Encoding.VAR objects, see following scheduler log:
   ```
   {kubernetes_executor.py:550} INFO - Add task TaskInstanceKey(dag_id='commit_check', task_id='sync_and_build', run_id='5776-2-1662037155', try_number=1, map_index=-1) with command ['airflow', 'tasks', 'run', 'commit_check', 'sync_and_build', '5776-2-1662037155', '--local', '--subdir', 'DAGS_FOLDER/dag_on_commit.py'] with executor_config {'pod_override': {'Encoding.VAR': {'Encoding.VAR': {'Encoding.VAR': {'metadata': {'Encoding.VAR': {'annotations': {'Encoding.VAR': {}, 'Encoding.TYPE': 'dict'}}, 'Encoding.TYPE': 'dict'}, 'spec': {'Encoding.VAR': {'containers': REDACTED 'Encoding.TYPE': 'k8s.V1Pod'}, 'Encoding.TYPE': 'dict'}}
   {kubernetes_executor.py:554} ERROR - Invalid executor_config for TaskInstanceKey(dag_id='commit_check', task_id='sync_and_build', run_id='5776-2-1662037155', try_number=1, map_index=-1)
   ```
   
   Looking in the UI, the task get stuck in scheduled state forever. By clicking instance details, it shows similar state of the pod_override with many Encoding.VAR. 
   
   
   This appears like a recent addition, in 2.3.4 via https://github.com/apache/airflow/pull/24356. 
   @dstandish  do you understand if this is connected?
   
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-celery==3.0.0
   apache-airflow-providers-cncf-kubernetes==4.3.0
   apache-airflow-providers-common-sql==1.1.0
   apache-airflow-providers-docker==3.1.0
   apache-airflow-providers-ftp==3.1.0
   apache-airflow-providers-http==4.0.0
   apache-airflow-providers-imap==3.0.0
   apache-airflow-providers-postgres==5.2.0
   apache-airflow-providers-sqlite==3.2.0
   kubernetes==23.6.0
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] joshzana commented on issue #26101: Kubernetes Invalid executor_config, pod_override filled with Encoding.VAR

Posted by GitBox <gi...@apache.org>.
joshzana commented on issue #26101:
URL: https://github.com/apache/airflow/issues/26101#issuecomment-1234552838

   +1 seeing the same thing on 2.3.4, KubernetesExector.  Wanted to provide more info.
   
   We run 40 DAGs with several thousand total daily tasks, and every day we have 5-10 tasks that get stuck in Queued state.
   
   We see the following sequence in the logs:
   First this error the first time the task attempts to run
   ```
   Invalid executor_config for ["tenant_extraction_acme_recruiting_id_72","discovery_finder_on_29735_agg_month","scheduled__2022-08-31T11:00:00+00:00",1,-1]
   
   ```
   
   The in a repeated loop we see logs like this:
   ```
   {base_executor.py:211} INFO - task TaskInstanceKey(dag_id='tenant_extraction_acme_recruiting_id_72', task_id='discovery_finder_on_29735_agg_month', run_id='scheduled__2022-08-31T11:00:00+00:00', try_number=1, map_index=-1) is still running
    ```
   for each stuck task, followed shortly by:
   ```
   {base_executor.py:215} ERROR - could not queue task TaskInstanceKey(dag_id='tenant_extraction_acme_recruiting_id_72', task_id='discovery_finder_on_29735_agg_month', run_id='scheduled__2022-08-31T11:00:00+00:00', try_number=1, map_index=-1) (still running after 4 attempts)
   ```
   
   This continues until the task is marked as failed in the airflow UI.
   
   
   The executor_config on the invalid tasks looks like this:
   ```
   {'pod_override': {<Encoding.VAR: '__var'>: {'spec': {'containers': [{'name': 'base', 'resources': {'limits': {}, 'requests': {'memory': '16Gi', 'cpu': '1'}}}]}}, <Encoding.TYPE: '__type'>: <DagAttributeTypes.POD: 'k8s.V1Pod'>}}
   ```
   
   
   Which is odd, because for the thousands of unaffected tasks it looks like this:
   ```
   {'pod_override': {'api_version': None, 'kind': None, 'metadata': None, 'spec': {'active_deadline_seconds': None, 'affinity': None, 'automount_service_account_token': None, 'containers': [{'args': None, 'command': None, 'env': None, 'env_from': None, 'image': None, 'image_pull_policy': None, 'lifecycle': None, 'liveness_probe': None, 'name': 'base', 'ports': None, 'readiness_probe': None, 'resources': {'limits': {}, 'requests': {'cpu': '1', 'memory': '16Gi'}}, 'security_context': None, 'startup_probe': None, 'stdin': None, 'stdin_once': None, 'termination_message_path': None, 'termination_message_policy': None, 'tty': None, 'volume_devices': None, 'volume_mounts': None, 'working_dir': None}], 'dns_config': None, 'dns_policy': None, 'enable_service_links': None, 'ephemeral_containers': None, 'host_aliases': None, 'host_ipc': None, 'host_network': None, 'host_pid': None, 'hostname': None, 'image_pull_secrets': None, 'init_containers': None, 'node_name': None, 'node_selector': None, '
 os': None, 'overhead': None, 'preemption_policy': None, 'priority': None, 'priority_class_name': None, 'readiness_gates': None, 'restart_policy': None, 'runtime_class_name': None, 'scheduler_name': None, 'security_context': None, 'service_account': None, 'service_account_name': None, 'set_hostname_as_fqdn': None, 'share_process_namespace': None, 'subdomain': None, 'termination_grace_period_seconds': None, 'tolerations': None, 'topology_spread_constraints': None, 'volumes': None}, 'status': None}}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pacificsky commented on issue #26101: Kubernetes Invalid executor_config, pod_override filled with Encoding.VAR

Posted by GitBox <gi...@apache.org>.
pacificsky commented on issue #26101:
URL: https://github.com/apache/airflow/issues/26101#issuecomment-1235994405

   @dstandish this seems to introduce non-deterministic behavior for any pod that uses KubernetesExecuter and specifies a pod override. Only pods that specify an override seem to be impacted. And the issue seems to be that the deserialization of the executer config is broken in some way in the new system.
   
   This can be easily repro'd in the airflow web UI by hitting reload repeatedly on the task instance details pane for a task that uses KubernetesExecuter and specifies an executor config. I was even able to cause the web ui to crash as a result.
   
   
   [airflow_pickle_bug.txt](https://github.com/apache/airflow/files/9481532/airflow_pickle_bug.txt)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jedcunningham closed issue #26101: Kubernetes Invalid executor_config, pod_override filled with Encoding.VAR

Posted by GitBox <gi...@apache.org>.
jedcunningham closed issue #26101: Kubernetes Invalid executor_config, pod_override filled with Encoding.VAR
URL: https://github.com/apache/airflow/issues/26101


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org