You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Jarek Potiuk (JIRA)" <ji...@apache.org> on 2019/01/03 09:00:09 UTC

[jira] [Assigned] (AIRFLOW-3615) Connection parsed from URI - case-insensitive UNIX socket paths in python 2.7 -> 3.5 (but not in 3.6)

     [ https://issues.apache.org/jira/browse/AIRFLOW-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Potiuk reassigned AIRFLOW-3615:
-------------------------------------

    Assignee: Jarek Potiuk

> Connection parsed from URI - case-insensitive UNIX socket paths in python 2.7 -> 3.5 (but not in 3.6) 
> ------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-3615
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3615
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Jarek Potiuk
>            Assignee: Jarek Potiuk
>            Priority: Major
>
> There is a problem with case sensitivity of parsing URI for database connections which are using local UNIX sockets rather than TCP connection.
> In case of local UNIX sockets the hostname part of the URI contains url-encoded local socket path rather than actual hostname and in case this path contains uppercase characters, urlparse will deliberately lowercase them when parsing. This is perfectly fine for hostnames (according to [https://tools.ietf.org/html/rfc3986#section-6.2.3)] case normalisation should be done for hostnames.
> However urlparse still uses hostname if the URI does not contain host but only local path (i.e. when the location starts with %2F ("/")). What's more - the host gets converted to lowercase for python 2.7 - 3.5. Surprisingly this is somewhat "fixed" in 3.6 (i.e if the URL location starts with %2F, the hostname is not normalized to lowercase any more ! - see below snippets showing the behaviours for different python versions) .
> In Airflow's Connection this problem bubbles up. Airflow uses urlparse to get the hostname/path in models.py:parse_from_uri and in case of UNIX sockets it is done via hostname. There is no other, reliable way when using urlparse because the path can also contain 'authority' (user/password) and this is urlparse's job to separate them out. The Airflow's Connection similarly does not make a distinction of TCP vs. local socket connection and it uses host field to store the  socket path (it's case sensitive however). So you can use UPPERCASE when you define connection in the database, but this is a problem for parsing connections from environment variables, because we currently cannot pass a URI where socket path contains UPPERCASE characters.
> Since urlparse is really there to parse URLs and it is not good for parsing non-URL URIs - we should likely use different parser which handles more generic URIs - including non-lowercasing path for all versions of python.
> I think we could also consider adding local path to Connection model and use it instead of hostname to store the socket path. This approach would be the "correct" one, but it might introduce some compatibility issues, so maybe it's not worth, considering that host is case sensitive in Airflow.
> Snippet showing urlparse behaviour in different python versions:
> {quote}Python 2.7.10 (default, Aug 17 2018, 19:45:58)
>  [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
>  Type "help", "copyright", "credits" or "license" for more information.
>  >>> from urlparse import urlparse,unquote
>  >>> conn = urlparse("http://AAA")
>  >>> conn.hostname
>  'aaa'
>  >>> conn = urlparse("http://%2FAAA")
>  >>> conn.hostname
>  '%2faaa'
> {quote}
>  
> {quote}Python 3.5.4 (v3.5.4:3f56838976, Aug 7 2017, 12:56:33)
>  [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
>  Type "help", "copyright", "credits" or "license" for more information.
>  >>> from urlparse import urlparse,unquote
>  Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  ImportError: No module named 'urlparse'
>  >>> from urllib.parse import urlparse,unquote
>  >>> conn = urlparse("http://AAA")
>  >>> conn.hostname
>  'aaa'
>  >>> conn = urlparse("http://%2FAAA")
>  >>> conn.hostname
>  '%2faaa'
> {quote}
>  
> {quote}Python 3.6.7 (v3.6.7:6ec5cf24b7, Oct 20 2018, 03:02:14)
>  [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
>  Type "help", "copyright", "credits" or "license" for more information.
>  >>> from urllib.parse import urlparse,unquote
>  >>> conn = urlparse("http://AAA")
>  >>> conn.hostname
>  'aaa'
>  >>> conn = urlparse("http://%2FAAA")
>  >>> conn.hostname
>  {color:#ff0000}'%2FAAA'{color}
> {quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)