You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/20 16:29:58 UTC

[GitHub] [airflow] potiuk commented on issue #25186: should we normalize scheme and hostname of URI to be lower?

potiuk commented on issue #25186:
URL: https://github.com/apache/airflow/issues/25186#issuecomment-1190498704

   I believe we should just follow the standards (also because if anyone would like to use any tools to compare the URIs for example might use a tool that is RFC3986 compliant:
   
   This is the URI's RFC: https://datatracker.ietf.org/doc/html/rfc3986 (not this is not HTTP URL RFC it's generir URI RFC so very much our case)
   
   *Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with lowercase letters
   * The host subcomponent is case-insensitive
   * Although host is case-insensitive, producers and normalizers should use lowercase for registered names and hexadecimal addresses for the sake ofuniformity, while only using uppercase letters for percent-encodings.
   
   
   ```
   6.2.2.1.  Case Normalization
   
      For all URIs, the hexadecimal digits within a percent-encoding
      triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore
      should be normalized to use uppercase letters for the digits A-F.
   
      When a URI uses components of the generic syntax, the component
      syntax equivalence rules always apply; namely, that the scheme and
      host are case-insensitive and therefore should be normalized to
      lowercase.  For example, the URI <HTTP://www.EXAMPLE.com/> is
      equivalent to <http://www.example.com/>.  The other generic syntax
      components are assumed to be case-sensitive unless specifically
      defined otherwise by the scheme (see Section 6.2.3).
   ```
   
   The rest is case-sensitive by default. We can define some specific parts of the URI case insensitive but it dependes on scheme:
   
   ```
      Some schemes define additional subcomponents that consist of case-
      insensitive data, giving an implicit license to normalizers to
      convert this data to a common case (e.g., all lowercase).  For
      example, URI schemes that define a subcomponent of path to contain an
      Internet hostname, such as the "mailto" URI scheme, cause that
      subcomponent to be case-insensitive and thus subject to case
      normalization (e.g., "mailto:Joe@Example.COM" is equivalent to
      "mailto:Joe@example.com", even though the generic syntax considers
      the path component to be case-sensitive).
   ```
   
   I think we do not have much freedom there in terms of host being case-sensitive. It cannot be.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org