You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Kevin Doran (Jira)" <ji...@apache.org> on 2020/09/11 18:58:00 UTC
[jira] [Comment Edited] (NIFI-7744) Add support for character sets other than US-ASCII in X-ProxiedEntitiesChain Header

    [ https://issues.apache.org/jira/browse/NIFI-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194486#comment-17194486 ] 

Kevin Doran edited comment on NIFI-7744 at 9/11/20, 6:57 PM:
-------------------------------------------------------------

Ultimately decided on the following (see [https://github.com/apache/nifi-registry/pull/302):]

Change the creation the {{X-ProxiedEntitiesChain}} header values to:
 # Detect non-ascii characters in an input identity/DN
 # If non-ascii characters are present, base64 encode the value and wrap it in additional {{<angle braces>}}

For example, a proxy chain of

{{Алйс, CN=nifi.apache.org}}

Will result in:

{{X-ProxiedEntitiesChain: <<0JDQu9C50YE=>><CN=nifi.apache.org>}}

Additionally, logic for tokenizing these header values should be updated to:
 # Detect if a token is encoded (by the presence of additional {{<angle braces>}})
 # If encoded, decode it as part of the tokenization process.

These changes should be completely backwards compatible, as entities containing only ascii characters are encoded exactly the same as they previously were. 

A number of implementation approaches were considered, including ones based on RFC-2231 and RFC-8187. Ultimately, approaches that relied on percent encoding were rejected as percent-encoding outside of URL encoding are not widespread in standard libraries and difficult to implement correctly given complexity of unicode and things such as surragote characters. Given that in theory, third-party components such as proxies may need to implement this logic, base64 was chosen as is it widely available in all languages and frameworks (particularly in reverse proxies, where base64 is used as part of basic auth header formation).

The use of double {{<<angle braces>>}} to indicate encoding was chosen as it allows for an easy way for a decoder to determine if the value is encoded (necessary because encoding is optional for purely ascii entities), and because the reserved characters {{<}} and {{>}} are already protected/escaped in our sanitization process.

I considered but decided against always base64 encoding the header value, because (1) this maintains backwards compatibility between nifi and registry on different versions and (2) plaintext, non-encoded values, which are still the majority of use-cases, are easier for users to troubleshoot.

If this approach is satisfactory, let me know and I will make corresponding changes in NiFi. 


was (Author: kdoran):
Ultimately decided on the following (see [https://github.com/apache/nifi-registry/pull/302):]



Change the creation the {{X-ProxiedEntitiesChain}} header values to:
 # Detect non-ascii characters in an input identity/DN
 # If non-ascii characters are present, base64 encode the value and wrap it in additional {{<angle braces>}}

For example, a proxy chain of

{{Алйс, CN=nifi.apache.org}}

Will result in:

{{X-ProxiedEntitiesChain: <<0JDQu9C50YE=>><CN=nifi.apache.org>}}

Additionally, logic for tokenizing these header values should be updated to:
 # Detect if a token is encoded (by the presence of additional {{<angle braces>}})
 # If encoded, decode it as part of the tokenization process.

These changes should be completely backwards compatible, as entities containing non-ascii characters are encoded exactly the same as they previously were. 

A number of implementation approaches were considered, including ones based on RFC-2231 and RFC-8187. Ultimately, approaches that relied on percent encoding were rejected as percent-encoding outside of URL encoding are not widespread in standard libraries and difficult to implement correctly given complexity of unicode and things such as surragote characters. Given that in theory, third-party components such as proxies may need to implement this logic, base64 was chosen as is it widely available in all languages and frameworks (particularly in reverse proxies, where base64 is used as part of basic auth header formation).

The use of double {{<<angle braces>>}} to indicate encoding was chosen as it allows for an easy way for a decoder to determine if the value is encoded (necessary because encoding is optional for purely ascii entities), and because the reserved characters {{<}} and {{>}} are already protected/escaped in our sanitization process.

I considered but decided against always base64 encoding the header value, because (1) this maintains backwards compatibility between nifi and registry on different versions and (2) plaintext, non-encoded values, which are still the majority of use-cases, are easier for users to troubleshoot.

If this approach is satisfactory, let me know and I will make corresponding changes in NiFi. 

> Add support for character sets other than US-ASCII in X-ProxiedEntitiesChain Header
> -----------------------------------------------------------------------------------
>
>                 Key: NIFI-7744
>                 URL: https://issues.apache.org/jira/browse/NIFI-7744
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Kevin Doran
>            Assignee: Kevin Doran
>            Priority: Major
>
> NiFi and NiFi Registry both support the concept of an authorized proxy making a web request on behalf of another authenticated user.
> This is implemented as follows:
>  * The proxy authenticates using two-way TLS (metal auth) with a client certificate. The DN of the client certificate is authenticated as a user, whereas the actual end user performing the action is passed in the X-ProxiedEntitiesChain custom header in the form <userId1><userId2><userId3>...
>  * The client certificate DN must be authorized (by the access policy provider) to act as a trusted poxy
>  * The proxied identity must be authorized to perform the desired action
> There is a shortcoming with this approach, which is that user identities can use a larger character set (Unicode / UTF-8) than HTTP Headers (US ASCII). 
> This ticket proposes adding a backward-compatible extension to the X-ProxiedEntitiesChain header value syntax such that languages and character sets other than US-ASCII can be encoded into the value.
> The exact encoding mechanism is secondary to the goal of this ticket. However, there are two relevant draft Internet Standards that are worth considering: [RFC-2231|https://tools.ietf.org/html/rfc2231] and [RFC-8187|https://tools.ietf.org/html/rfc8187] which is a more prescriptive simplification of RFC-2231.
> Following the method outlined in RFC-8187, the new header syntax would look something like this, in which utf-8 characters outside the ascii attire-char set are octet encoded and then percent encoded:
> Given the raw entity chain string of <Алйс><Боб>:
> {noformat}
> X-ProxiedEntitesChain: encoded; value*=utf-8''%3C%D0%90%D0%BB%D0%B9%D1%81%3E%3C%D0%91%D0%BE%D0%B1%3E{noformat}
> Alternatively, we could disregard RFCs 2231 and 8187, and use our own encoding scheme such as base64: 
> {noformat}
> X-ProxiedEntitiesChain: base64; value=PNCQ0LvQudGBPjzQkdC+0LE+{noformat}
>  
> In either case, the raw header value would first be read and parsed (matched against a magic string or regex) to see if it is an encoded value or legacy, raw ascii value. If encoded, nifi and nifi registry would first decode the value before proceeding with the normal logic. If not encoded, the behavior would be unchanged from how it currently works, and the raw string would be interpreted as a proxied entity chain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)