You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Ping Zhang <pi...@umich.edu> on 2020/09/05 19:29:05 UTC

Re: Optimize KuberneteExecutor pod labels to task instance key

Thanks, Daniel. The PR was merged:
https://github.com/apache/airflow/pull/10568/files

Best wishes

Ping Zhang


On Mon, Aug 24, 2020 at 9:31 AM Daniel Imberman <da...@gmail.com>
wrote:

> Hi Ping,
>
> I think that’s a great idea! Would be glad to help merge this.
>
> via Newton Mail [
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> ]
> On Sun, Aug 23, 2020 at 11:33 PM, Ping Zhang <pi...@umich.edu> wrote:
> Hi everyone,
>
> I was evaluating using *KubernetesExcutor* and found the inefficiency of `
> *_labels_to_key*`, see code
> <
> https://github.com/apache/airflow/blob/master/airflow/executors/kubernetes_executor.py#L608-L674
> >,
> which potentially does a very expensive db query for a large airflow
> cluster when the dag_id or task_id have different char sets of kubernetes
> labels
> <
> https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set
> >
> .
>
> I am proposing using Pod Annotation to record the task instance key
> information given that the value of annotation does not have restriction.
> In the event streaming from k8s, the annotation can be retrieved via `
> *task.metadata.annotations*` with code example
> <
> https://gist.github.com/pingzh/f3488116304b81d73d1bed3c53a5c85f#file-stream_pod-py
> >
> .
>
> Please let me know your thoughts before I start to upstream my changes.
>
> Best wishes
>
> Ping Zhang