You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/01 17:06:54 UTC

[GitHub] [airflow] joshzana commented on issue #26101: Kubernetes Invalid executor_config, pod_override filled with Encoding.VAR

joshzana commented on issue #26101:
URL: https://github.com/apache/airflow/issues/26101#issuecomment-1234552838

   +1 seeing the same thing on 2.3.4, KubernetesExector.  Wanted to provide more info.
   
   We run 40 DAGs with several thousand total daily tasks, and every day we have 5-10 tasks that get stuck in Queued state.
   
   We see the following sequence in the logs:
   First this error the first time the task attempts to run
   ```
   Invalid executor_config for ["tenant_extraction_acme_recruiting_id_72","discovery_finder_on_29735_agg_month","scheduled__2022-08-31T11:00:00+00:00",1,-1]
   
   ```
   
   The in a repeated loop we see logs like this:
   ```
   {base_executor.py:211} INFO - task TaskInstanceKey(dag_id='tenant_extraction_acme_recruiting_id_72', task_id='discovery_finder_on_29735_agg_month', run_id='scheduled__2022-08-31T11:00:00+00:00', try_number=1, map_index=-1) is still running
    ```
   for each stuck task, followed shortly by:
   ```
   {base_executor.py:215} ERROR - could not queue task TaskInstanceKey(dag_id='tenant_extraction_acme_recruiting_id_72', task_id='discovery_finder_on_29735_agg_month', run_id='scheduled__2022-08-31T11:00:00+00:00', try_number=1, map_index=-1) (still running after 4 attempts)
   ```
   
   This continues until the task is marked as failed in the airflow UI.
   
   
   The executor_config on the invalid tasks looks like this:
   ```
   {'pod_override': {<Encoding.VAR: '__var'>: {'spec': {'containers': [{'name': 'base', 'resources': {'limits': {}, 'requests': {'memory': '16Gi', 'cpu': '1'}}}]}}, <Encoding.TYPE: '__type'>: <DagAttributeTypes.POD: 'k8s.V1Pod'>}}
   ```
   
   
   Which is odd, because for the thousands of unaffected tasks it looks like this:
   ```
   {'pod_override': {'api_version': None, 'kind': None, 'metadata': None, 'spec': {'active_deadline_seconds': None, 'affinity': None, 'automount_service_account_token': None, 'containers': [{'args': None, 'command': None, 'env': None, 'env_from': None, 'image': None, 'image_pull_policy': None, 'lifecycle': None, 'liveness_probe': None, 'name': 'base', 'ports': None, 'readiness_probe': None, 'resources': {'limits': {}, 'requests': {'cpu': '1', 'memory': '16Gi'}}, 'security_context': None, 'startup_probe': None, 'stdin': None, 'stdin_once': None, 'termination_message_path': None, 'termination_message_policy': None, 'tty': None, 'volume_devices': None, 'volume_mounts': None, 'working_dir': None}], 'dns_config': None, 'dns_policy': None, 'enable_service_links': None, 'ephemeral_containers': None, 'host_aliases': None, 'host_ipc': None, 'host_network': None, 'host_pid': None, 'hostname': None, 'image_pull_secrets': None, 'init_containers': None, 'node_name': None, 'node_selector': None, '
 os': None, 'overhead': None, 'preemption_policy': None, 'priority': None, 'priority_class_name': None, 'readiness_gates': None, 'restart_policy': None, 'runtime_class_name': None, 'scheduler_name': None, 'security_context': None, 'service_account': None, 'service_account_name': None, 'set_hostname_as_fqdn': None, 'share_process_namespace': None, 'subdomain': None, 'termination_grace_period_seconds': None, 'tolerations': None, 'topology_spread_constraints': None, 'volumes': None}, 'status': None}}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org