You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/30 05:21:51 UTC

[GitHub] [airflow] jonstacks commented on issue #10292: Kubernetes Executor - invalid hostname in the database

jonstacks commented on issue #10292:
URL: https://github.com/apache/airflow/issues/10292#issuecomment-683378076


   I'm also seeing this in our EKS clusters(version 1.15.11) with airflow 1.10.11 and 1.10.12 using the Kubernetes Executor. Everything else seems to be working as expected with the exception of logs.
   
   I think I've traced it back to how the hostname gets set in the DB. I think this call to get_hostname is the issue:
   
   https://github.com/apache/airflow/blob/4e3799fec4c23d0f43603a0489c5a6158aeba035/airflow/utils/net.py#L31-L36
   
   If you exec into a pod with a long pod name(> 63 chars) and  and run `echo -n $(hostname) | wc -c` you should get the truncated name you are seeing in the database(length 63 characters). At least that is happening for me. I think it has to do with the maximum length of a label in a fqdn being 63 characters.
   
   It looks like airflow has an environment variable for this callable and we could do something like this: https://stackoverflow.com/a/62905570. If there is an easy way to expose the log port on the executor pods, I think we could easily override this with the environment variable to set it to the IP. I didn't see anything in the documentation for it, except maybe a pod template file, but I think that overrides the whole POD.
   
   It looks like it shouldn't be a problem from the side that looks at the logs:
   
   https://github.com/apache/airflow/blob/4e3799fec4c23d0f43603a0489c5a6158aeba035/airflow/utils/log/file_task_handler.py#L109-L123
   
   I am new to managing airflow for our company, but I'm wondering if it would make sense to, by default, inject a `POD_NAME` environment variable using the downward API by default into the container and if that environment variable is set, return that for the hostname, so it gets set in the DB. Hopefully it would make the default configuration handle this case where the pod name is really long.
   
   Something like:
   ```yaml
   - name: MY_POD_NAME
     valueFrom:
       fieldRef:
         fieldPath: metadata.name
   ```
   
   We are currently trying to get this working so that we can use the KubernetesExecutor in production instead of the CeleryExecutor. If anyone knows a good way around this, I would be great full to learn the best way to handle this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org