You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Ash Berlin-Taylor (JIRA)" <ji...@apache.org> on 2019/07/02 11:49:00 UTC

[jira] [Updated] (AIRFLOW-4862) Allow directly using IP address as hostname in airflow.utils.net.get_hostname()

     [ https://issues.apache.org/jira/browse/AIRFLOW-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ash Berlin-Taylor updated AIRFLOW-4862:
---------------------------------------
    Fix Version/s:     (was: 1.10.4)
                   1.10.5

We may not have a 1.10.5 (cherry-picks are getting increasingly difficult from master) but setting this for that new version if we do.

> Allow directly using IP address as hostname in airflow.utils.net.get_hostname()
> -------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4862
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4862
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: utils
>    Affects Versions: 1.10.3
>            Reporter: Xiaodong DENG
>            Assignee: Xiaodong DENG
>            Priority: Minor
>             Fix For: 1.10.5
>
>
> In airflow.utils.net.get_hostname(), the default function used to get hostname for nodes (like worker) is *socket.getfqdn()*, which will return fully qualified domain name.
> In some cases, we do need to ensure that hostnames can resolved so that nodes can talk to each other. One example is: if I use S3 for remote logging, then the log will only be pushed to S3 after the job is finished (either success or failure); when the job is still running, webserver will first check if the log is available in its own volume, if not, webserver will fetch log from worker.
> If workers' hostnames are something like "airflow-worker-53-4bp8v" (e.g., when running on OpenShift or K8S), it's possible that the hostname can't be resolved, then we will observe errors like below
> {code:java}
> *** Log file does not exist: /opt/app-root/airflow/logs/example_python_operator/sleep_for_3/2019-06-18T08:14:15.313472+00:00/2.log
> *** Fetching from: http://airflow-worker-57-n69vb:8793/log/example_python_operator/sleep_for_3/2019-06-18T08:14:15.313472+00:00/2.log
> *** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-57-n69vb', port=8793): Max retries exceeded with url: /log/example_python_operator/sleep_for_3/2019-06-18T08:14:15.313472+00:00/2.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fec3266a6d8>: Failed to establish a new connection: [Errno -2] Name or service not known',))
> {code}
>  
> This may be addressed by properly setting service discovery, but the users may not always have the privilege to do that (like myself in my organization).
> Another solution is to change "hostname_callable" in Airflow's configuration ([core] section). But it will only work if you have a function which can return a resolvable hostname and it can't take any argument (due to the existing implementation [https://github.com/apache/airflow/blob/dd08ae3469a50a145f9ae7f819ed1840fe2a5bd6/airflow/utils/net.py#L41-L45).]
>  
> The change I would like to propose is: allow users to use IP address directly as hostname.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)