You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@knox.apache.org by "Willmer, Alex (UK Defence)" <al...@cgi.com> on 2017/05/24 16:32:55 UTC

Encoding/escaping whitespace in WebHDFS requests

How should I encode spaces characters in the URL when I make a request to WebHDFS through Knox? Or should be enabling/configuring  something in Knox to handle them?

I'm making the following (redacted values in <>) request to WebHDFS, through Knox

curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" \
     -<username>:<password> -k -s

However Knox is returning HTTP 404 with the following body (whitespace/formatting added by me)

{"exception":"FileNotFoundException",
 "javaClassName":"java.io.FileNotFoundException",
 "message":"File /docs/filename+with+spaces.pdf not found."}}

I've tried encoding the spaces as + (same result), and not encoding them (HTTP 400  Unknown Version). 
If I request a file for which the path does not contain spaces then it works.

Any ideas?

With thanks, Alex



PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is enabled in the cluster.

The (redacted) response headers for the %20 encoded request

< HTTP/1.1 404 Not Found
< Date: Wed, 24 May 2017 15:34:26 GMT
< Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk48y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0; Expires=Tue, 23-May-2017 15:34:26 GMT
< Cache-Control: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< X-FRAME-OPTIONS: SAMEORIGIN
< Content-Type: application/json; charset=UTF-8
< Server: Jetty(6.1.26.hwx)
< Content-Length: 252

The (redacted) Knox logs for the %20 encoded request

==> /var/log/hadoop/knox/gateway-audit.log <==
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request method: GET
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|Groups: []
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response status: 404
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|Response status: 404

==> /var/log/hadoop/knox/gateway.log <==
2017-05-24 15:51:05,254 INFO  hadoop.gateway (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn: uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for principal: <username>
2017-05-24 15:51:05,259 INFO  hadoop.gateway (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true

The (redacted) topology

<topology>
    <gateway>
        <provider>
            <role>authentication</role>
            <name>ShiroProvider</name>
            <enabled>true</enabled>
            <param>
                <name>sessionTimeout</name>
                <value>30</value>
            </param>
            <param>
                <name>main.ldapRealm</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
            </param>
            <param>
                <name>main.ldapContextFactory</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory</name>
                <value>$ldapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.userDnTemplate</name>
                <value>uid={0},cn=users,cn=accounts,dc=<cluster></value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.url</name>
                <value>ldap://<freeipa_node>:389</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                <value>simple</value>
            </param>
            <param>
                <name>urls./**</name>
                <value>authcBasic</value>
            </param>
        </provider>
        <provider>
            <role>authorization</role>
            <name>AclsAuthz</name>
            <enabled>true</enabled>
            <param>
                <name>knox.acl</name>
                <value>admin;*;*</value>
            </param>
        </provider>
        <provider>
            <role>identity-assertion</role>
            <name>Default</name>
            <enabled>true</enabled>
        </provider>
        <provider>
            <role>hostmap</role>
            <name>static</name>
            <enabled>false</enabled>
            <param><name>localhost</name><value>sandbox,sandbox.hortonworks.com</value></param>
        </provider>
    </gateway>

    <service>
        <role>WEBHDFS</role>
        <url>http://<namenode>:50070/webhdfs</url>
    </service>

    <service>
        <role>SOLRAPI</role>
        <url>http://<solrnode>:6083/solr</url>
    </service>
</topology>


RE: Encoding/escaping whitespace in WebHDFS requests

Posted by "Willmer, Alex (UK Defence)" <al...@cgi.com>.
Kevin, I did use Knox with HDP 2.4. I can't say whether we saw this issue though. Sorry.

Alex

________________________________
From: Kevin Risden [compuwizard123@gmail.com]
Sent: 24 May 2017 23:24
To: user@knox.apache.org
Subject: Re: Encoding/escaping whitespace in WebHDFS requests

Just saw this as I was submitting a potentially related WebHBase url encoding email to the knox-user list. Curious if they are related.

Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not see this issue?

Kevin Risden

On Wed, May 24, 2017 at 4:08 PM, larry mccay <la...@gmail.com>> wrote:
Thank you, Alex.

Please file a JIRA for this with the above details.
I will try and reproduce and investigate and see if we can't get it fixed or a workaround for the 0.13.0 release.
This is planned for the end of next week.

On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) <al...@cgi.com>> wrote:
Hi Larry,

The same file does work directly from WebHDFS (see below). Looking more closely at the logs I sent previously, it looks like Knox (or something in the chain I'm unaware of) is decoding the %20 encoded spaces, then reencoding them as + encoded, i.e.

17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request method: GET
..
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response status: 404

With thanks, Alex


Direct WebHDFS request (hostnames redacted)

# curl -si -u: "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" --negotiate -L | head -n40
HTTP/1.1 401 Authentication required
Cache-Control: must-revalidate,no-cache,no-store
Date: Wed, 24 May 2017 19:01:41 GMT
Pragma: no-cache
Date: Wed, 24 May 2017 19:01:41 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Path=/; HttpOnly
Content-Type: text/html; charset=iso-8859-1
Content-Length: 1533
Server: Jetty(6.1.26.hwx)

HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Wed, 24 May 2017 19:01:42 GMT
Date: Wed, 24 May 2017 19:01:42 GMT
Pragma: no-cache
Expires: Wed, 24 May 2017 19:01:42 GMT
Date: Wed, 24 May 2017 19:01:42 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAgEFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxMzW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly
Content-Type: application/octet-stream
Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
Content-Length: 0
Server: Jetty(6.1.26.hwx)

HTTP/1.1 200 OK
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
Content-Type: application/octet-stream
Connection: close
Content-Length: 13365618

%����1.6
<</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
...


________________________________
From: larry mccay [lmccay@apache.org<ma...@apache.org>]
Sent: 24 May 2017 18:05
To: user@knox.apache.org<ma...@knox.apache.org>
Subject: Re: Encoding/escaping whitespace in WebHDFS requests

Hi Alex -

I notice from the audit log that the 404 is actually coming from WebHDFS not from Knox.
Can you confirm that direct access to WebHDFS without going through Knox works with the same URL?

thanks,

--larry

On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <al...@cgi.com>> wrote:
How should I encode spaces characters in the URL when I make a request to WebHDFS through Knox? Or should be enabling/configuring  something in Knox to handle them?

I'm making the following (redacted values in <>) request to WebHDFS, through Knox

curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" \
     -<username>:<password> -k -s

However Knox is returning HTTP 404 with the following body (whitespace/formatting added by me)

{"exception":"FileNotFoundException",
 "javaClassName":"java.io<http://java.io>.FileNotFoundException",
 "message":"File /docs/filename+with+spaces.pdf not found."}}

I've tried encoding the spaces as + (same result), and not encoding them (HTTP 400  Unknown Version).
If I request a file for which the path does not contain spaces then it works.

Any ideas?

With thanks, Alex



PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is enabled in the cluster.

The (redacted) response headers for the %20 encoded request

< HTTP/1.1 404 Not Found
< Date: Wed, 24 May 2017 15:34:26 GMT
< Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk48y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0; Expires=Tue, 23-May-2017 15:34:26 GMT
< Cache-Control: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< X-FRAME-OPTIONS: SAMEORIGIN
< Content-Type: application/json; charset=UTF-8
< Server: Jetty(6.1.26.hwx)
< Content-Length: 252

The (redacted) Knox logs for the %20 encoded request

==> /var/log/hadoop/knox/gateway-audit.log <==
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request method: GET
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|Groups: []
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response status: 404
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|Response status: 404

==> /var/log/hadoop/knox/gateway.log <==
2017-05-24 15:51:05,254 INFO  hadoop.gateway (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn: uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for principal: <username>
2017-05-24 15:51:05,259 INFO  hadoop.gateway (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true

The (redacted) topology

<topology>
    <gateway>
        <provider>
            <role>authentication</role>
            <name>ShiroProvider</name>
            <enabled>true</enabled>
            <param>
                <name>sessionTimeout</name>
                <value>30</value>
            </param>
            <param>
                <name>main.ldapRealm</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
            </param>
            <param>
                <name>main.ldapContextFactory</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory</name>
                <value>$ldapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.userDnTemplate</name>
                <value>uid={0},cn=users,cn=accounts,dc=<cluster></value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.url</name>
                <value>ldap://<freeipa_node>:389</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                <value>simple</value>
            </param>
            <param>
                <name>urls./**</name>
                <value>authcBasic</value>
            </param>
        </provider>
        <provider>
            <role>authorization</role>
            <name>AclsAuthz</name>
            <enabled>true</enabled>
            <param>
                <name>knox.acl</name>
                <value>admin;*;*</value>
            </param>
        </provider>
        <provider>
            <role>identity-assertion</role>
            <name>Default</name>
            <enabled>true</enabled>
        </provider>
        <provider>
            <role>hostmap</role>
            <name>static</name>
            <enabled>false</enabled>
            <param><name>localhost</name><value>sandbox,sandbox.hortonworks.com<http://sandbox.hortonworks.com></value></param>
        </provider>
    </gateway>

    <service>
        <role>WEBHDFS</role>
        <url>http://<namenode>:50070/webhdfs</url>
    </service>

    <service>
        <role>SOLRAPI</role>
        <url>http://<solrnode>:6083/solr</url>
    </service>
</topology>





Re: Encoding/escaping whitespace in WebHDFS requests

Posted by larry mccay <lm...@apache.org>.
Doubt that it is related actually.
We do need to determine whether we have blocker bugs as we are going to try
and close down on 0.13.0 over the next week.


On Wed, May 24, 2017 at 9:53 PM, Kevin Risden <co...@gmail.com>
wrote:

> Thanks Larry yea I had stumbled upon KNOX-709. Way more detail is in
> thread "WebHBase URL Encoding Issue". Didn't want to hijack this thread if
> the WebHBase issue isn't related.
>
> Kevin Risden
>
> On Wed, May 24, 2017 at 6:08 PM, larry mccay <lm...@apache.org> wrote:
>
>> Hi Kevin -
>>
>> You may see some change related to https://issues.apache.org/j
>> ira/browse/KNOX-709.
>>
>> thanks,
>>
>> --larry
>>
>> On Wed, May 24, 2017 at 6:24 PM, Kevin Risden <co...@gmail.com>
>> wrote:
>>
>>> Just saw this as I was submitting a potentially related WebHBase url
>>> encoding email to the knox-user list. Curious if they are related.
>>>
>>> Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not
>>> see this issue?
>>>
>>> Kevin Risden
>>>
>>> On Wed, May 24, 2017 at 4:08 PM, larry mccay <la...@gmail.com>
>>> wrote:
>>>
>>>> Thank you, Alex.
>>>>
>>>> Please file a JIRA for this with the above details.
>>>> I will try and reproduce and investigate and see if we can't get it
>>>> fixed or a workaround for the 0.13.0 release.
>>>> This is planned for the end of next week.
>>>>
>>>> On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) <
>>>> alex.willmer@cgi.com> wrote:
>>>>
>>>>> Hi Larry,
>>>>>
>>>>> The same file does work directly from WebHDFS (see below). Looking
>>>>> more closely at the logs I sent previously, it looks like Knox (or
>>>>> something in the chain I'm unaware of) is decoding the %20 encoded spaces,
>>>>> then reencoding them as + encoded, i.e.
>>>>>
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webh
>>>>> dfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request
>>>>> method: GET
>>>>> ..
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf
>>>>> ?op=OPEN&doAs=<username>|success|Response status: 404
>>>>>
>>>>> With thanks, Alex
>>>>>
>>>>>
>>>>> Direct WebHDFS request (hostnames redacted)
>>>>>
>>>>> # curl -si -u: "http://<namenode>:50070/webhd
>>>>> fs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" --negotiate -L |
>>>>> head -n40
>>>>> HTTP/1.1 401 Authentication required
>>>>> Cache-Control: must-revalidate,no-cache,no-store
>>>>> Date: Wed, 24 May 2017 19:01:41 GMT
>>>>> Pragma: no-cache
>>>>> Date: Wed, 24 May 2017 19:01:41 GMT
>>>>> Pragma: no-cache
>>>>> X-FRAME-OPTIONS: SAMEORIGIN
>>>>> WWW-Authenticate: Negotiate
>>>>> Set-Cookie: hadoop.auth=; Path=/; HttpOnly
>>>>> Content-Type: text/html; charset=iso-8859-1
>>>>> Content-Length: 1533
>>>>> Server: Jetty(6.1.26.hwx)
>>>>>
>>>>> HTTP/1.1 307 TEMPORARY_REDIRECT
>>>>> Cache-Control: no-cache
>>>>> Expires: Wed, 24 May 2017 19:01:42 GMT
>>>>> Date: Wed, 24 May 2017 19:01:42 GMT
>>>>> Pragma: no-cache
>>>>> Expires: Wed, 24 May 2017 19:01:42 GMT
>>>>> Date: Wed, 24 May 2017 19:01:42 GMT
>>>>> Pragma: no-cache
>>>>> X-FRAME-OPTIONS: SAMEORIGIN
>>>>> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg
>>>>> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxM
>>>>> zW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
>>>>> Set-Cookie: hadoop.auth="u=admin&p=admin@C
>>>>> YSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E=";
>>>>> Path=/; HttpOnly
>>>>> Content-Type: application/octet-stream
>>>>> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20sp
>>>>> aces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooB
>>>>> XF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRp
>>>>> b24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
>>>>> Content-Length: 0
>>>>> Server: Jetty(6.1.26.hwx)
>>>>>
>>>>> HTTP/1.1 200 OK
>>>>> Access-Control-Allow-Methods: GET
>>>>> Access-Control-Allow-Origin: *
>>>>> Content-Type: application/octet-stream
>>>>> Connection: close
>>>>> Content-Length: 13365618
>>>>>
>>>>> %����1.6
>>>>> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
>>>>> ...
>>>>>
>>>>>
>>>>> ------------------------------
>>>>> *From:* larry mccay [lmccay@apache.org]
>>>>> *Sent:* 24 May 2017 18:05
>>>>> *To:* user@knox.apache.org
>>>>> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests
>>>>>
>>>>> Hi Alex -
>>>>>
>>>>> I notice from the audit log that the 404 is actually coming from
>>>>> WebHDFS not from Knox.
>>>>> Can you confirm that direct access to WebHDFS without going through
>>>>> Knox works with the same URL?
>>>>>
>>>>> thanks,
>>>>>
>>>>> --larry
>>>>>
>>>>> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <
>>>>> alex.willmer@cgi.com> wrote:
>>>>>
>>>>>> How should I encode spaces characters in the URL when I make a
>>>>>> request to WebHDFS through Knox? Or should be enabling/configuring
>>>>>> something in Knox to handle them?
>>>>>>
>>>>>> I'm making the following (redacted values in <>) request to WebHDFS,
>>>>>> through Knox
>>>>>>
>>>>>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/
>>>>>> filename%20with%20spaces.pdf?op=OPEN" \
>>>>>>      -<username>:<password> -k -s
>>>>>>
>>>>>> However Knox is returning HTTP 404 with the following body
>>>>>> (whitespace/formatting added by me)
>>>>>>
>>>>>> {"exception":"FileNotFoundException",
>>>>>>  "javaClassName":"java.io.FileNotFoundException",
>>>>>>  "message":"File /docs/filename+with+spaces.pdf not found."}}
>>>>>>
>>>>>> I've tried encoding the spaces as + (same result), and not encoding
>>>>>> them (HTTP 400  Unknown Version).
>>>>>> If I request a file for which the path does not contain spaces then
>>>>>> it works.
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> With thanks, Alex
>>>>>>
>>>>>>
>>>>>>
>>>>>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK
>>>>>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is
>>>>>> enabled in the cluster.
>>>>>>
>>>>>> The (redacted) response headers for the %20 encoded request
>>>>>>
>>>>>> < HTTP/1.1 404 Not Found
>>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>>>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4
>>>>>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
>>>>>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT
>>>>>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0;
>>>>>> Expires=Tue, 23-May-2017 15:34:26 GMT
>>>>>> < Cache-Control: no-cache
>>>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>>>> < Pragma: no-cache
>>>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>>>> < Pragma: no-cache
>>>>>> < X-FRAME-OPTIONS: SAMEORIGIN
>>>>>> < Content-Type: application/json; charset=UTF-8
>>>>>> < Server: Jetty(6.1.26.hwx)
>>>>>> < Content-Length: 252
>>>>>>
>>>>>> The (redacted) Knox logs for the %20 encoded request
>>>>>>
>>>>>> ==> /var/log/hadoop/knox/gateway-audit.log <==
>>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>>>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>>>>> way/<cluster>/webhdfs/v1/docs/filename with
>>>>>> spaces.pdf?op=OPEN|success|
>>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>>>>> way/<cluster>/webhdfs/v1/docs/filename with
>>>>>> spaces.pdf?op=OPEN|success|Groups: []
>>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gatew
>>>>>> ay/<cluster>/webhdfs/v1/docs/filename with
>>>>>> spaces.pdf?op=OPEN|success|
>>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>>>>> pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
>>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>>>>> pdf?op=OPEN&doAs=<username>|success|Response status: 404
>>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>>>>> with spaces.pdf?op=OPEN|success|Response status: 404
>>>>>>
>>>>>> ==> /var/log/hadoop/knox/gateway.log <==
>>>>>> 2017-05-24 15:51:05,254 INFO  hadoop.gateway
>>>>>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn:
>>>>>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate
>>>>>> for principal: <username>
>>>>>> 2017-05-24 15:51:05,259 INFO  hadoop.gateway
>>>>>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true
>>>>>>
>>>>>> The (redacted) topology
>>>>>>
>>>>>> <topology>
>>>>>>     <gateway>
>>>>>>         <provider>
>>>>>>             <role>authentication</role>
>>>>>>             <name>ShiroProvider</name>
>>>>>>             <enabled>true</enabled>
>>>>>>             <param>
>>>>>>                 <name>sessionTimeout</name>
>>>>>>                 <value>30</value>
>>>>>>             </param>
>>>>>>             <param>
>>>>>>                 <name>main.ldapRealm</name>
>>>>>>                 <value>org.apache.hadoop.gatew
>>>>>> ay.shirorealm.KnoxLdapRealm</value>
>>>>>>             </param>
>>>>>>             <param>
>>>>>>                 <name>main.ldapContextFactory</name>
>>>>>>                 <value>org.apache.hadoop.gatew
>>>>>> ay.shirorealm.KnoxLdapContextFactory</value>
>>>>>>             </param>
>>>>>>             <param>
>>>>>>                 <name>main.ldapRealm.contextFactory</name>
>>>>>>                 <value>$ldapContextFactory</value>
>>>>>>             </param>
>>>>>>             <param>
>>>>>>                 <name>main.ldapRealm.userDnTemplate</name>
>>>>>>                 <value>uid={0},cn=users,cn=acc
>>>>>> ounts,dc=<cluster></value>
>>>>>>             </param>
>>>>>>             <param>
>>>>>>                 <name>main.ldapRealm.contextFactory.url</name>
>>>>>>                 <value>ldap://<freeipa_node>:389</value>
>>>>>>             </param>
>>>>>>             <param>
>>>>>>                 <name>main.ldapRealm.contextFa
>>>>>> ctory.authenticationMechanism</name>
>>>>>>                 <value>simple</value>
>>>>>>             </param>
>>>>>>             <param>
>>>>>>                 <name>urls./**</name>
>>>>>>                 <value>authcBasic</value>
>>>>>>             </param>
>>>>>>         </provider>
>>>>>>         <provider>
>>>>>>             <role>authorization</role>
>>>>>>             <name>AclsAuthz</name>
>>>>>>             <enabled>true</enabled>
>>>>>>             <param>
>>>>>>                 <name>knox.acl</name>
>>>>>>                 <value>admin;*;*</value>
>>>>>>             </param>
>>>>>>         </provider>
>>>>>>         <provider>
>>>>>>             <role>identity-assertion</role>
>>>>>>             <name>Default</name>
>>>>>>             <enabled>true</enabled>
>>>>>>         </provider>
>>>>>>         <provider>
>>>>>>             <role>hostmap</role>
>>>>>>             <name>static</name>
>>>>>>             <enabled>false</enabled>
>>>>>>             <param><name>localhost</name><value>sandbox,
>>>>>> sandbox.hortonworks.com</value></param>
>>>>>>         </provider>
>>>>>>     </gateway>
>>>>>>
>>>>>>     <service>
>>>>>>         <role>WEBHDFS</role>
>>>>>>         <url>http://<namenode>:50070/webhdfs</url>
>>>>>>     </service>
>>>>>>
>>>>>>     <service>
>>>>>>         <role>SOLRAPI</role>
>>>>>>         <url>http://<solrnode>:6083/solr</url>
>>>>>>     </service>
>>>>>> </topology>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Encoding/escaping whitespace in WebHDFS requests

Posted by Kevin Risden <co...@gmail.com>.
Thanks Larry yea I had stumbled upon KNOX-709. Way more detail is in thread
"WebHBase URL Encoding Issue". Didn't want to hijack this thread if the
WebHBase issue isn't related.

Kevin Risden

On Wed, May 24, 2017 at 6:08 PM, larry mccay <lm...@apache.org> wrote:

> Hi Kevin -
>
> You may see some change related to https://issues.apache.org/
> jira/browse/KNOX-709.
>
> thanks,
>
> --larry
>
> On Wed, May 24, 2017 at 6:24 PM, Kevin Risden <co...@gmail.com>
> wrote:
>
>> Just saw this as I was submitting a potentially related WebHBase url
>> encoding email to the knox-user list. Curious if they are related.
>>
>> Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not
>> see this issue?
>>
>> Kevin Risden
>>
>> On Wed, May 24, 2017 at 4:08 PM, larry mccay <la...@gmail.com>
>> wrote:
>>
>>> Thank you, Alex.
>>>
>>> Please file a JIRA for this with the above details.
>>> I will try and reproduce and investigate and see if we can't get it
>>> fixed or a workaround for the 0.13.0 release.
>>> This is planned for the end of next week.
>>>
>>> On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) <
>>> alex.willmer@cgi.com> wrote:
>>>
>>>> Hi Larry,
>>>>
>>>> The same file does work directly from WebHDFS (see below). Looking more
>>>> closely at the logs I sent previously, it looks like Knox (or something in
>>>> the chain I'm unaware of) is decoding the %20 encoded spaces, then
>>>> reencoding them as + encoded, i.e.
>>>>
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webh
>>>> dfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request
>>>> method: GET
>>>> ..
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf
>>>> ?op=OPEN&doAs=<username>|success|Response status: 404
>>>>
>>>> With thanks, Alex
>>>>
>>>>
>>>> Direct WebHDFS request (hostnames redacted)
>>>>
>>>> # curl -si -u: "http://<namenode>:50070/webhd
>>>> fs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" --negotiate -L | head
>>>> -n40
>>>> HTTP/1.1 401 Authentication required
>>>> Cache-Control: must-revalidate,no-cache,no-store
>>>> Date: Wed, 24 May 2017 19:01:41 GMT
>>>> Pragma: no-cache
>>>> Date: Wed, 24 May 2017 19:01:41 GMT
>>>> Pragma: no-cache
>>>> X-FRAME-OPTIONS: SAMEORIGIN
>>>> WWW-Authenticate: Negotiate
>>>> Set-Cookie: hadoop.auth=; Path=/; HttpOnly
>>>> Content-Type: text/html; charset=iso-8859-1
>>>> Content-Length: 1533
>>>> Server: Jetty(6.1.26.hwx)
>>>>
>>>> HTTP/1.1 307 TEMPORARY_REDIRECT
>>>> Cache-Control: no-cache
>>>> Expires: Wed, 24 May 2017 19:01:42 GMT
>>>> Date: Wed, 24 May 2017 19:01:42 GMT
>>>> Pragma: no-cache
>>>> Expires: Wed, 24 May 2017 19:01:42 GMT
>>>> Date: Wed, 24 May 2017 19:01:42 GMT
>>>> Pragma: no-cache
>>>> X-FRAME-OPTIONS: SAMEORIGIN
>>>> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg
>>>> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxM
>>>> zW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
>>>> Set-Cookie: hadoop.auth="u=admin&p=admin@C
>>>> YSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E=";
>>>> Path=/; HttpOnly
>>>> Content-Type: application/octet-stream
>>>> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20sp
>>>> aces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooB
>>>> XF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRp
>>>> b24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
>>>> Content-Length: 0
>>>> Server: Jetty(6.1.26.hwx)
>>>>
>>>> HTTP/1.1 200 OK
>>>> Access-Control-Allow-Methods: GET
>>>> Access-Control-Allow-Origin: *
>>>> Content-Type: application/octet-stream
>>>> Connection: close
>>>> Content-Length: 13365618
>>>>
>>>> %����1.6
>>>> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
>>>> ...
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* larry mccay [lmccay@apache.org]
>>>> *Sent:* 24 May 2017 18:05
>>>> *To:* user@knox.apache.org
>>>> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests
>>>>
>>>> Hi Alex -
>>>>
>>>> I notice from the audit log that the 404 is actually coming from
>>>> WebHDFS not from Knox.
>>>> Can you confirm that direct access to WebHDFS without going through
>>>> Knox works with the same URL?
>>>>
>>>> thanks,
>>>>
>>>> --larry
>>>>
>>>> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <
>>>> alex.willmer@cgi.com> wrote:
>>>>
>>>>> How should I encode spaces characters in the URL when I make a request
>>>>> to WebHDFS through Knox? Or should be enabling/configuring  something in
>>>>> Knox to handle them?
>>>>>
>>>>> I'm making the following (redacted values in <>) request to WebHDFS,
>>>>> through Knox
>>>>>
>>>>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/
>>>>> filename%20with%20spaces.pdf?op=OPEN" \
>>>>>      -<username>:<password> -k -s
>>>>>
>>>>> However Knox is returning HTTP 404 with the following body
>>>>> (whitespace/formatting added by me)
>>>>>
>>>>> {"exception":"FileNotFoundException",
>>>>>  "javaClassName":"java.io.FileNotFoundException",
>>>>>  "message":"File /docs/filename+with+spaces.pdf not found."}}
>>>>>
>>>>> I've tried encoding the spaces as + (same result), and not encoding
>>>>> them (HTTP 400  Unknown Version).
>>>>> If I request a file for which the path does not contain spaces then it
>>>>> works.
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> With thanks, Alex
>>>>>
>>>>>
>>>>>
>>>>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK
>>>>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is
>>>>> enabled in the cluster.
>>>>>
>>>>> The (redacted) response headers for the %20 encoded request
>>>>>
>>>>> < HTTP/1.1 404 Not Found
>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4
>>>>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
>>>>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT
>>>>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0;
>>>>> Expires=Tue, 23-May-2017 15:34:26 GMT
>>>>> < Cache-Control: no-cache
>>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>>> < Pragma: no-cache
>>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>>> < Pragma: no-cache
>>>>> < X-FRAME-OPTIONS: SAMEORIGIN
>>>>> < Content-Type: application/json; charset=UTF-8
>>>>> < Server: Jetty(6.1.26.hwx)
>>>>> < Content-Length: 252
>>>>>
>>>>> The (redacted) Knox logs for the %20 encoded request
>>>>>
>>>>> ==> /var/log/hadoop/knox/gateway-audit.log <==
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>>>> way/<cluster>/webhdfs/v1/docs/filename with
>>>>> spaces.pdf?op=OPEN|success|
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>>>> way/<cluster>/webhdfs/v1/docs/filename with
>>>>> spaces.pdf?op=OPEN|success|Groups: []
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gatew
>>>>> ay/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>>>> pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>>>> pdf?op=OPEN&doAs=<username>|success|Response status: 404
>>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>>>> with spaces.pdf?op=OPEN|success|Response status: 404
>>>>>
>>>>> ==> /var/log/hadoop/knox/gateway.log <==
>>>>> 2017-05-24 15:51:05,254 INFO  hadoop.gateway
>>>>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn:
>>>>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for
>>>>> principal: <username>
>>>>> 2017-05-24 15:51:05,259 INFO  hadoop.gateway
>>>>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true
>>>>>
>>>>> The (redacted) topology
>>>>>
>>>>> <topology>
>>>>>     <gateway>
>>>>>         <provider>
>>>>>             <role>authentication</role>
>>>>>             <name>ShiroProvider</name>
>>>>>             <enabled>true</enabled>
>>>>>             <param>
>>>>>                 <name>sessionTimeout</name>
>>>>>                 <value>30</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapRealm</name>
>>>>>                 <value>org.apache.hadoop.gatew
>>>>> ay.shirorealm.KnoxLdapRealm</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapContextFactory</name>
>>>>>                 <value>org.apache.hadoop.gatew
>>>>> ay.shirorealm.KnoxLdapContextFactory</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapRealm.contextFactory</name>
>>>>>                 <value>$ldapContextFactory</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapRealm.userDnTemplate</name>
>>>>>                 <value>uid={0},cn=users,cn=acc
>>>>> ounts,dc=<cluster></value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapRealm.contextFactory.url</name>
>>>>>                 <value>ldap://<freeipa_node>:389</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>main.ldapRealm.contextFa
>>>>> ctory.authenticationMechanism</name>
>>>>>                 <value>simple</value>
>>>>>             </param>
>>>>>             <param>
>>>>>                 <name>urls./**</name>
>>>>>                 <value>authcBasic</value>
>>>>>             </param>
>>>>>         </provider>
>>>>>         <provider>
>>>>>             <role>authorization</role>
>>>>>             <name>AclsAuthz</name>
>>>>>             <enabled>true</enabled>
>>>>>             <param>
>>>>>                 <name>knox.acl</name>
>>>>>                 <value>admin;*;*</value>
>>>>>             </param>
>>>>>         </provider>
>>>>>         <provider>
>>>>>             <role>identity-assertion</role>
>>>>>             <name>Default</name>
>>>>>             <enabled>true</enabled>
>>>>>         </provider>
>>>>>         <provider>
>>>>>             <role>hostmap</role>
>>>>>             <name>static</name>
>>>>>             <enabled>false</enabled>
>>>>>             <param><name>localhost</name><value>sandbox,
>>>>> sandbox.hortonworks.com</value></param>
>>>>>         </provider>
>>>>>     </gateway>
>>>>>
>>>>>     <service>
>>>>>         <role>WEBHDFS</role>
>>>>>         <url>http://<namenode>:50070/webhdfs</url>
>>>>>     </service>
>>>>>
>>>>>     <service>
>>>>>         <role>SOLRAPI</role>
>>>>>         <url>http://<solrnode>:6083/solr</url>
>>>>>     </service>
>>>>> </topology>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Encoding/escaping whitespace in WebHDFS requests

Posted by larry mccay <lm...@apache.org>.
Hi Kevin -

You may see some change related to
https://issues.apache.org/jira/browse/KNOX-709.

thanks,

--larry

On Wed, May 24, 2017 at 6:24 PM, Kevin Risden <co...@gmail.com>
wrote:

> Just saw this as I was submitting a potentially related WebHBase url
> encoding email to the knox-user list. Curious if they are related.
>
> Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not
> see this issue?
>
> Kevin Risden
>
> On Wed, May 24, 2017 at 4:08 PM, larry mccay <la...@gmail.com>
> wrote:
>
>> Thank you, Alex.
>>
>> Please file a JIRA for this with the above details.
>> I will try and reproduce and investigate and see if we can't get it fixed
>> or a workaround for the 0.13.0 release.
>> This is planned for the end of next week.
>>
>> On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) <
>> alex.willmer@cgi.com> wrote:
>>
>>> Hi Larry,
>>>
>>> The same file does work directly from WebHDFS (see below). Looking more
>>> closely at the logs I sent previously, it looks like Knox (or something in
>>> the chain I'm unaware of) is decoding the %20 encoded spaces, then
>>> reencoding them as + encoded, i.e.
>>>
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>>> ..
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf
>>> ?op=OPEN&doAs=<username>|success|Response status: 404
>>>
>>> With thanks, Alex
>>>
>>>
>>> Direct WebHDFS request (hostnames redacted)
>>>
>>> # curl -si -u: "http://<namenode>:50070/webhd
>>> fs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" --negotiate -L | head
>>> -n40
>>> HTTP/1.1 401 Authentication required
>>> Cache-Control: must-revalidate,no-cache,no-store
>>> Date: Wed, 24 May 2017 19:01:41 GMT
>>> Pragma: no-cache
>>> Date: Wed, 24 May 2017 19:01:41 GMT
>>> Pragma: no-cache
>>> X-FRAME-OPTIONS: SAMEORIGIN
>>> WWW-Authenticate: Negotiate
>>> Set-Cookie: hadoop.auth=; Path=/; HttpOnly
>>> Content-Type: text/html; charset=iso-8859-1
>>> Content-Length: 1533
>>> Server: Jetty(6.1.26.hwx)
>>>
>>> HTTP/1.1 307 TEMPORARY_REDIRECT
>>> Cache-Control: no-cache
>>> Expires: Wed, 24 May 2017 19:01:42 GMT
>>> Date: Wed, 24 May 2017 19:01:42 GMT
>>> Pragma: no-cache
>>> Expires: Wed, 24 May 2017 19:01:42 GMT
>>> Date: Wed, 24 May 2017 19:01:42 GMT
>>> Pragma: no-cache
>>> X-FRAME-OPTIONS: SAMEORIGIN
>>> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg
>>> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxM
>>> zW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
>>> Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=14956885020
>>> 02&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly
>>> Content-Type: application/octet-stream
>>> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20sp
>>> aces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8o
>>> oBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYX
>>> Rpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
>>> Content-Length: 0
>>> Server: Jetty(6.1.26.hwx)
>>>
>>> HTTP/1.1 200 OK
>>> Access-Control-Allow-Methods: GET
>>> Access-Control-Allow-Origin: *
>>> Content-Type: application/octet-stream
>>> Connection: close
>>> Content-Length: 13365618
>>>
>>> %����1.6
>>> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
>>> ...
>>>
>>>
>>> ------------------------------
>>> *From:* larry mccay [lmccay@apache.org]
>>> *Sent:* 24 May 2017 18:05
>>> *To:* user@knox.apache.org
>>> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests
>>>
>>> Hi Alex -
>>>
>>> I notice from the audit log that the 404 is actually coming from WebHDFS
>>> not from Knox.
>>> Can you confirm that direct access to WebHDFS without going through Knox
>>> works with the same URL?
>>>
>>> thanks,
>>>
>>> --larry
>>>
>>> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <
>>> alex.willmer@cgi.com> wrote:
>>>
>>>> How should I encode spaces characters in the URL when I make a request
>>>> to WebHDFS through Knox? Or should be enabling/configuring  something in
>>>> Knox to handle them?
>>>>
>>>> I'm making the following (redacted values in <>) request to WebHDFS,
>>>> through Knox
>>>>
>>>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/
>>>> filename%20with%20spaces.pdf?op=OPEN" \
>>>>      -<username>:<password> -k -s
>>>>
>>>> However Knox is returning HTTP 404 with the following body
>>>> (whitespace/formatting added by me)
>>>>
>>>> {"exception":"FileNotFoundException",
>>>>  "javaClassName":"java.io.FileNotFoundException",
>>>>  "message":"File /docs/filename+with+spaces.pdf not found."}}
>>>>
>>>> I've tried encoding the spaces as + (same result), and not encoding
>>>> them (HTTP 400  Unknown Version).
>>>> If I request a file for which the path does not contain spaces then it
>>>> works.
>>>>
>>>> Any ideas?
>>>>
>>>> With thanks, Alex
>>>>
>>>>
>>>>
>>>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK
>>>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is
>>>> enabled in the cluster.
>>>>
>>>> The (redacted) response headers for the %20 encoded request
>>>>
>>>> < HTTP/1.1 404 Not Found
>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4
>>>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
>>>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT
>>>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0;
>>>> Expires=Tue, 23-May-2017 15:34:26 GMT
>>>> < Cache-Control: no-cache
>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>> < Pragma: no-cache
>>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>>> < Pragma: no-cache
>>>> < X-FRAME-OPTIONS: SAMEORIGIN
>>>> < Content-Type: application/json; charset=UTF-8
>>>> < Server: Jetty(6.1.26.hwx)
>>>> < Content-Length: 252
>>>>
>>>> The (redacted) Knox logs for the %20 encoded request
>>>>
>>>> ==> /var/log/hadoop/knox/gateway-audit.log <==
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>>> way/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>>> way/<cluster>/webhdfs/v1/docs/filename with
>>>> spaces.pdf?op=OPEN|success|Groups: []
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gatew
>>>> ay/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>>> pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>>> pdf?op=OPEN&doAs=<username>|success|Response status: 404
>>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>>> with spaces.pdf?op=OPEN|success|Response status: 404
>>>>
>>>> ==> /var/log/hadoop/knox/gateway.log <==
>>>> 2017-05-24 15:51:05,254 INFO  hadoop.gateway
>>>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn:
>>>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for
>>>> principal: <username>
>>>> 2017-05-24 15:51:05,259 INFO  hadoop.gateway
>>>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true
>>>>
>>>> The (redacted) topology
>>>>
>>>> <topology>
>>>>     <gateway>
>>>>         <provider>
>>>>             <role>authentication</role>
>>>>             <name>ShiroProvider</name>
>>>>             <enabled>true</enabled>
>>>>             <param>
>>>>                 <name>sessionTimeout</name>
>>>>                 <value>30</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapRealm</name>
>>>>                 <value>org.apache.hadoop.gatew
>>>> ay.shirorealm.KnoxLdapRealm</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapContextFactory</name>
>>>>                 <value>org.apache.hadoop.gatew
>>>> ay.shirorealm.KnoxLdapContextFactory</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapRealm.contextFactory</name>
>>>>                 <value>$ldapContextFactory</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapRealm.userDnTemplate</name>
>>>>                 <value>uid={0},cn=users,cn=acc
>>>> ounts,dc=<cluster></value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapRealm.contextFactory.url</name>
>>>>                 <value>ldap://<freeipa_node>:389</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>main.ldapRealm.contextFa
>>>> ctory.authenticationMechanism</name>
>>>>                 <value>simple</value>
>>>>             </param>
>>>>             <param>
>>>>                 <name>urls./**</name>
>>>>                 <value>authcBasic</value>
>>>>             </param>
>>>>         </provider>
>>>>         <provider>
>>>>             <role>authorization</role>
>>>>             <name>AclsAuthz</name>
>>>>             <enabled>true</enabled>
>>>>             <param>
>>>>                 <name>knox.acl</name>
>>>>                 <value>admin;*;*</value>
>>>>             </param>
>>>>         </provider>
>>>>         <provider>
>>>>             <role>identity-assertion</role>
>>>>             <name>Default</name>
>>>>             <enabled>true</enabled>
>>>>         </provider>
>>>>         <provider>
>>>>             <role>hostmap</role>
>>>>             <name>static</name>
>>>>             <enabled>false</enabled>
>>>>             <param><name>localhost</name><value>sandbox,
>>>> sandbox.hortonworks.com</value></param>
>>>>         </provider>
>>>>     </gateway>
>>>>
>>>>     <service>
>>>>         <role>WEBHDFS</role>
>>>>         <url>http://<namenode>:50070/webhdfs</url>
>>>>     </service>
>>>>
>>>>     <service>
>>>>         <role>SOLRAPI</role>
>>>>         <url>http://<solrnode>:6083/solr</url>
>>>>     </service>
>>>> </topology>
>>>>
>>>>
>>>
>>
>

Re: Encoding/escaping whitespace in WebHDFS requests

Posted by Kevin Risden <co...@gmail.com>.
Just saw this as I was submitting a potentially related WebHBase url
encoding email to the knox-user list. Curious if they are related.

Alex - out of curiousity did you use Knox with HDP 2.4 or prior and not see
this issue?

Kevin Risden

On Wed, May 24, 2017 at 4:08 PM, larry mccay <la...@gmail.com> wrote:

> Thank you, Alex.
>
> Please file a JIRA for this with the above details.
> I will try and reproduce and investigate and see if we can't get it fixed
> or a workaround for the 0.13.0 release.
> This is planned for the end of next week.
>
> On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) <
> alex.willmer@cgi.com> wrote:
>
>> Hi Larry,
>>
>> The same file does work directly from WebHDFS (see below). Looking more
>> closely at the logs I sent previously, it looks like Knox (or something in
>> the chain I'm unaware of) is decoding the %20 encoded spaces, then
>> reencoding them as + encoded, i.e.
>>
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>> ..
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<
>> namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf
>> ?op=OPEN&doAs=<username>|success|Response status: 404
>>
>> With thanks, Alex
>>
>>
>> Direct WebHDFS request (hostnames redacted)
>>
>> # curl -si -u: "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN"
>> --negotiate -L | head -n40
>> HTTP/1.1 401 Authentication required
>> Cache-Control: must-revalidate,no-cache,no-store
>> Date: Wed, 24 May 2017 19:01:41 GMT
>> Pragma: no-cache
>> Date: Wed, 24 May 2017 19:01:41 GMT
>> Pragma: no-cache
>> X-FRAME-OPTIONS: SAMEORIGIN
>> WWW-Authenticate: Negotiate
>> Set-Cookie: hadoop.auth=; Path=/; HttpOnly
>> Content-Type: text/html; charset=iso-8859-1
>> Content-Length: 1533
>> Server: Jetty(6.1.26.hwx)
>>
>> HTTP/1.1 307 TEMPORARY_REDIRECT
>> Cache-Control: no-cache
>> Expires: Wed, 24 May 2017 19:01:42 GMT
>> Date: Wed, 24 May 2017 19:01:42 GMT
>> Pragma: no-cache
>> Expires: Wed, 24 May 2017 19:01:42 GMT
>> Date: Wed, 24 May 2017 19:01:42 GMT
>> Pragma: no-cache
>> X-FRAME-OPTIONS: SAMEORIGIN
>> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg
>> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxM
>> zW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
>> Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=14956885020
>> 02&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly
>> Content-Type: application/octet-stream
>> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%
>> 20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ
>> 8ooBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVn
>> YXRpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
>> Content-Length: 0
>> Server: Jetty(6.1.26.hwx)
>>
>> HTTP/1.1 200 OK
>> Access-Control-Allow-Methods: GET
>> Access-Control-Allow-Origin: *
>> Content-Type: application/octet-stream
>> Connection: close
>> Content-Length: 13365618
>>
>> %����1.6
>> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
>> ...
>>
>>
>> ------------------------------
>> *From:* larry mccay [lmccay@apache.org]
>> *Sent:* 24 May 2017 18:05
>> *To:* user@knox.apache.org
>> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests
>>
>> Hi Alex -
>>
>> I notice from the audit log that the 404 is actually coming from WebHDFS
>> not from Knox.
>> Can you confirm that direct access to WebHDFS without going through Knox
>> works with the same URL?
>>
>> thanks,
>>
>> --larry
>>
>> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <
>> alex.willmer@cgi.com> wrote:
>>
>>> How should I encode spaces characters in the URL when I make a request
>>> to WebHDFS through Knox? Or should be enabling/configuring  something in
>>> Knox to handle them?
>>>
>>> I'm making the following (redacted values in <>) request to WebHDFS,
>>> through Knox
>>>
>>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/
>>> filename%20with%20spaces.pdf?op=OPEN" \
>>>      -<username>:<password> -k -s
>>>
>>> However Knox is returning HTTP 404 with the following body
>>> (whitespace/formatting added by me)
>>>
>>> {"exception":"FileNotFoundException",
>>>  "javaClassName":"java.io.FileNotFoundException",
>>>  "message":"File /docs/filename+with+spaces.pdf not found."}}
>>>
>>> I've tried encoding the spaces as + (same result), and not encoding them
>>> (HTTP 400  Unknown Version).
>>> If I request a file for which the path does not contain spaces then it
>>> works.
>>>
>>> Any ideas?
>>>
>>> With thanks, Alex
>>>
>>>
>>>
>>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK
>>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is
>>> enabled in the cluster.
>>>
>>> The (redacted) response headers for the %20 encoded request
>>>
>>> < HTTP/1.1 404 Not Found
>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4
>>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
>>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT
>>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0;
>>> Expires=Tue, 23-May-2017 15:34:26 GMT
>>> < Cache-Control: no-cache
>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>> < Pragma: no-cache
>>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>>> < Date: Wed, 24 May 2017 15:34:26 GMT
>>> < Pragma: no-cache
>>> < X-FRAME-OPTIONS: SAMEORIGIN
>>> < Content-Type: application/json; charset=UTF-8
>>> < Server: Jetty(6.1.26.hwx)
>>> < Content-Length: 252
>>>
>>> The (redacted) Knox logs for the %20 encoded request
>>>
>>> ==> /var/log/hadoop/knox/gateway-audit.log <==
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>> way/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gate
>>> way/<cluster>/webhdfs/v1/docs/filename with
>>> spaces.pdf?op=OPEN|success|Groups: []
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gatew
>>> ay/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>> pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<nam
>>> enode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.
>>> pdf?op=OPEN&doAs=<username>|success|Response status: 404
>>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>>> with spaces.pdf?op=OPEN|success|Response status: 404
>>>
>>> ==> /var/log/hadoop/knox/gateway.log <==
>>> 2017-05-24 15:51:05,254 INFO  hadoop.gateway
>>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn:
>>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for
>>> principal: <username>
>>> 2017-05-24 15:51:05,259 INFO  hadoop.gateway
>>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true
>>>
>>> The (redacted) topology
>>>
>>> <topology>
>>>     <gateway>
>>>         <provider>
>>>             <role>authentication</role>
>>>             <name>ShiroProvider</name>
>>>             <enabled>true</enabled>
>>>             <param>
>>>                 <name>sessionTimeout</name>
>>>                 <value>30</value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapRealm</name>
>>>                 <value>org.apache.hadoop.gatew
>>> ay.shirorealm.KnoxLdapRealm</value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapContextFactory</name>
>>>                 <value>org.apache.hadoop.gatew
>>> ay.shirorealm.KnoxLdapContextFactory</value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapRealm.contextFactory</name>
>>>                 <value>$ldapContextFactory</value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapRealm.userDnTemplate</name>
>>>                 <value>uid={0},cn=users,cn=accounts,dc=<cluster></value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapRealm.contextFactory.url</name>
>>>                 <value>ldap://<freeipa_node>:389</value>
>>>             </param>
>>>             <param>
>>>                 <name>main.ldapRealm.contextFa
>>> ctory.authenticationMechanism</name>
>>>                 <value>simple</value>
>>>             </param>
>>>             <param>
>>>                 <name>urls./**</name>
>>>                 <value>authcBasic</value>
>>>             </param>
>>>         </provider>
>>>         <provider>
>>>             <role>authorization</role>
>>>             <name>AclsAuthz</name>
>>>             <enabled>true</enabled>
>>>             <param>
>>>                 <name>knox.acl</name>
>>>                 <value>admin;*;*</value>
>>>             </param>
>>>         </provider>
>>>         <provider>
>>>             <role>identity-assertion</role>
>>>             <name>Default</name>
>>>             <enabled>true</enabled>
>>>         </provider>
>>>         <provider>
>>>             <role>hostmap</role>
>>>             <name>static</name>
>>>             <enabled>false</enabled>
>>>             <param><name>localhost</name><value>sandbox,sandbox.hortonwo
>>> rks.com</value></param>
>>>         </provider>
>>>     </gateway>
>>>
>>>     <service>
>>>         <role>WEBHDFS</role>
>>>         <url>http://<namenode>:50070/webhdfs</url>
>>>     </service>
>>>
>>>     <service>
>>>         <role>SOLRAPI</role>
>>>         <url>http://<solrnode>:6083/solr</url>
>>>     </service>
>>> </topology>
>>>
>>>
>>
>

RE: Encoding/escaping whitespace in WebHDFS requests

Posted by "Willmer, Alex (UK Defence)" <al...@cgi.com>.
Thanks Larry,

I've raised https://issues.apache.org/jira/browse/KNOX-949

________________________________
From: larry mccay [larry.mccay@gmail.com]
Sent: 24 May 2017 22:08
To: user@knox.apache.org
Subject: Re: Encoding/escaping whitespace in WebHDFS requests

Thank you, Alex.

Please file a JIRA for this with the above details.
I will try and reproduce and investigate and see if we can't get it fixed or a workaround for the 0.13.0 release.
This is planned for the end of next week.

On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) <al...@cgi.com>> wrote:
Hi Larry,

The same file does work directly from WebHDFS (see below). Looking more closely at the logs I sent previously, it looks like Knox (or something in the chain I'm unaware of) is decoding the %20 encoded spaces, then reencoding them as + encoded, i.e.

17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request method: GET
..
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response status: 404

With thanks, Alex


Direct WebHDFS request (hostnames redacted)

# curl -si -u: "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" --negotiate -L | head -n40
HTTP/1.1 401 Authentication required
Cache-Control: must-revalidate,no-cache,no-store
Date: Wed, 24 May 2017 19:01:41 GMT
Pragma: no-cache
Date: Wed, 24 May 2017 19:01:41 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Path=/; HttpOnly
Content-Type: text/html; charset=iso-8859-1
Content-Length: 1533
Server: Jetty(6.1.26.hwx)

HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Wed, 24 May 2017 19:01:42 GMT
Date: Wed, 24 May 2017 19:01:42 GMT
Pragma: no-cache
Expires: Wed, 24 May 2017 19:01:42 GMT
Date: Wed, 24 May 2017 19:01:42 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAgEFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxMzW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly
Content-Type: application/octet-stream
Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
Content-Length: 0
Server: Jetty(6.1.26.hwx)

HTTP/1.1 200 OK
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
Content-Type: application/octet-stream
Connection: close
Content-Length: 13365618

%����1.6
<</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
...


________________________________
From: larry mccay [lmccay@apache.org<ma...@apache.org>]
Sent: 24 May 2017 18:05
To: user@knox.apache.org<ma...@knox.apache.org>
Subject: Re: Encoding/escaping whitespace in WebHDFS requests

Hi Alex -

I notice from the audit log that the 404 is actually coming from WebHDFS not from Knox.
Can you confirm that direct access to WebHDFS without going through Knox works with the same URL?

thanks,

--larry

On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <al...@cgi.com>> wrote:
How should I encode spaces characters in the URL when I make a request to WebHDFS through Knox? Or should be enabling/configuring  something in Knox to handle them?

I'm making the following (redacted values in <>) request to WebHDFS, through Knox

curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" \
     -<username>:<password> -k -s

However Knox is returning HTTP 404 with the following body (whitespace/formatting added by me)

{"exception":"FileNotFoundException",
 "javaClassName":"java.io<http://java.io>.FileNotFoundException",
 "message":"File /docs/filename+with+spaces.pdf not found."}}

I've tried encoding the spaces as + (same result), and not encoding them (HTTP 400  Unknown Version).
If I request a file for which the path does not contain spaces then it works.

Any ideas?

With thanks, Alex



PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is enabled in the cluster.

The (redacted) response headers for the %20 encoded request

< HTTP/1.1 404 Not Found
< Date: Wed, 24 May 2017 15:34:26 GMT
< Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk48y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0; Expires=Tue, 23-May-2017 15:34:26 GMT
< Cache-Control: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< X-FRAME-OPTIONS: SAMEORIGIN
< Content-Type: application/json; charset=UTF-8
< Server: Jetty(6.1.26.hwx)
< Content-Length: 252

The (redacted) Knox logs for the %20 encoded request

==> /var/log/hadoop/knox/gateway-audit.log <==
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request method: GET
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|Groups: []
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response status: 404
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|Response status: 404

==> /var/log/hadoop/knox/gateway.log <==
2017-05-24 15:51:05,254 INFO  hadoop.gateway (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn: uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for principal: <username>
2017-05-24 15:51:05,259 INFO  hadoop.gateway (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true

The (redacted) topology

<topology>
    <gateway>
        <provider>
            <role>authentication</role>
            <name>ShiroProvider</name>
            <enabled>true</enabled>
            <param>
                <name>sessionTimeout</name>
                <value>30</value>
            </param>
            <param>
                <name>main.ldapRealm</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
            </param>
            <param>
                <name>main.ldapContextFactory</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory</name>
                <value>$ldapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.userDnTemplate</name>
                <value>uid={0},cn=users,cn=accounts,dc=<cluster></value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.url</name>
                <value>ldap://<freeipa_node>:389</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                <value>simple</value>
            </param>
            <param>
                <name>urls./**</name>
                <value>authcBasic</value>
            </param>
        </provider>
        <provider>
            <role>authorization</role>
            <name>AclsAuthz</name>
            <enabled>true</enabled>
            <param>
                <name>knox.acl</name>
                <value>admin;*;*</value>
            </param>
        </provider>
        <provider>
            <role>identity-assertion</role>
            <name>Default</name>
            <enabled>true</enabled>
        </provider>
        <provider>
            <role>hostmap</role>
            <name>static</name>
            <enabled>false</enabled>
            <param><name>localhost</name><value>sandbox,sandbox.hortonworks.com<http://sandbox.hortonworks.com></value></param>
        </provider>
    </gateway>

    <service>
        <role>WEBHDFS</role>
        <url>http://<namenode>:50070/webhdfs</url>
    </service>

    <service>
        <role>SOLRAPI</role>
        <url>http://<solrnode>:6083/solr</url>
    </service>
</topology>




Re: Encoding/escaping whitespace in WebHDFS requests

Posted by larry mccay <la...@gmail.com>.
Thank you, Alex.

Please file a JIRA for this with the above details.
I will try and reproduce and investigate and see if we can't get it fixed
or a workaround for the 0.13.0 release.
This is planned for the end of next week.

On Wed, May 24, 2017 at 3:18 PM, Willmer, Alex (UK Defence) <
alex.willmer@cgi.com> wrote:

> Hi Larry,
>
> The same file does work directly from WebHDFS (see below). Looking more
> closely at the logs I sent previously, it looks like Knox (or something in
> the chain I'm unaware of) is decoding the %20 encoded spaces, then
> reencoding them as + encoded, i.e.
>
> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||
> access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf
> ?op=OPEN|unavailable|Request method: GET
> ..
> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<
> username>|||dispatch|uri|http://<namenode>.<cluster>:50070/
> webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response
> status: 404
>
> With thanks, Alex
>
>
> Direct WebHDFS request (hostnames redacted)
>
> # curl -si -u: "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN"
> --negotiate -L | head -n40
> HTTP/1.1 401 Authentication required
> Cache-Control: must-revalidate,no-cache,no-store
> Date: Wed, 24 May 2017 19:01:41 GMT
> Pragma: no-cache
> Date: Wed, 24 May 2017 19:01:41 GMT
> Pragma: no-cache
> X-FRAME-OPTIONS: SAMEORIGIN
> WWW-Authenticate: Negotiate
> Set-Cookie: hadoop.auth=; Path=/; HttpOnly
> Content-Type: text/html; charset=iso-8859-1
> Content-Length: 1533
> Server: Jetty(6.1.26.hwx)
>
> HTTP/1.1 307 TEMPORARY_REDIRECT
> Cache-Control: no-cache
> Expires: Wed, 24 May 2017 19:01:42 GMT
> Date: Wed, 24 May 2017 19:01:42 GMT
> Pragma: no-cache
> Expires: Wed, 24 May 2017 19:01:42 GMT
> Date: Wed, 24 May 2017 19:01:42 GMT
> Pragma: no-cache
> X-FRAME-OPTIONS: SAMEORIGIN
> WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAg
> EFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscx
> MzW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
> Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=
> 1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly
> Content-Type: application/octet-stream
> Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%
> 20with%20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooBXF_
> ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRpb24P
> MTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
> Content-Length: 0
> Server: Jetty(6.1.26.hwx)
>
> HTTP/1.1 200 OK
> Access-Control-Allow-Methods: GET
> Access-Control-Allow-Origin: *
> Content-Type: application/octet-stream
> Connection: close
> Content-Length: 13365618
>
> %����1.6
> <</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
> ...
>
>
> ------------------------------
> *From:* larry mccay [lmccay@apache.org]
> *Sent:* 24 May 2017 18:05
> *To:* user@knox.apache.org
> *Subject:* Re: Encoding/escaping whitespace in WebHDFS requests
>
> Hi Alex -
>
> I notice from the audit log that the 404 is actually coming from WebHDFS
> not from Knox.
> Can you confirm that direct access to WebHDFS without going through Knox
> works with the same URL?
>
> thanks,
>
> --larry
>
> On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <
> alex.willmer@cgi.com> wrote:
>
>> How should I encode spaces characters in the URL when I make a request to
>> WebHDFS through Knox? Or should be enabling/configuring  something in Knox
>> to handle them?
>>
>> I'm making the following (redacted values in <>) request to WebHDFS,
>> through Knox
>>
>> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/
>> filename%20with%20spaces.pdf?op=OPEN" \
>>      -<username>:<password> -k -s
>>
>> However Knox is returning HTTP 404 with the following body
>> (whitespace/formatting added by me)
>>
>> {"exception":"FileNotFoundException",
>>  "javaClassName":"java.io.FileNotFoundException",
>>  "message":"File /docs/filename+with+spaces.pdf not found."}}
>>
>> I've tried encoding the spaces as + (same result), and not encoding them
>> (HTTP 400  Unknown Version).
>> If I request a file for which the path does not contain spaces then it
>> works.
>>
>> Any ideas?
>>
>> With thanks, Alex
>>
>>
>>
>> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK
>> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is
>> enabled in the cluster.
>>
>> The (redacted) response headers for the %20 encoded request
>>
>> < HTTP/1.1 404 Not Found
>> < Date: Wed, 24 May 2017 15:34:26 GMT
>> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk4
>> 8y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
>> < Expires: Thu, 01 Jan 1970 00:00:00 GMT
>> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0;
>> Expires=Tue, 23-May-2017 15:34:26 GMT
>> < Cache-Control: no-cache
>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>> < Date: Wed, 24 May 2017 15:34:26 GMT
>> < Pragma: no-cache
>> < Expires: Wed, 24 May 2017 15:34:26 GMT
>> < Date: Wed, 24 May 2017 15:34:26 GMT
>> < Pragma: no-cache
>> < X-FRAME-OPTIONS: SAMEORIGIN
>> < Content-Type: application/json; charset=UTF-8
>> < Server: Jetty(6.1.26.hwx)
>> < Content-Length: 252
>>
>> The (redacted) Knox logs for the %20 encoded request
>>
>> ==> /var/log/hadoop/knox/gateway-audit.log <==
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
>> with spaces.pdf?op=OPEN|unavailable|Request method: GET
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/
>> gateway/<cluster>/webhdfs/v1/docs/filename with
>> spaces.pdf?op=OPEN|success|
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||authentication|uri|/
>> gateway/<cluster>/webhdfs/v1/docs/filename with
>> spaces.pdf?op=OPEN|success|Groups: []
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||authorization|uri|/
>> gateway/<cluster>/webhdfs/v1/docs/filename with
>> spaces.pdf?op=OPEN|success|
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<
>> namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spac
>> es.pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<
>> namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spac
>> es.pdf?op=OPEN&doAs=<username>|success|Response status: 404
>> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f9
>> 6b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<
>> cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|Response
>> status: 404
>>
>> ==> /var/log/hadoop/knox/gateway.log <==
>> 2017-05-24 15:51:05,254 INFO  hadoop.gateway
>> (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn:
>> uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for
>> principal: <username>
>> 2017-05-24 15:51:05,259 INFO  hadoop.gateway
>> (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true
>>
>> The (redacted) topology
>>
>> <topology>
>>     <gateway>
>>         <provider>
>>             <role>authentication</role>
>>             <name>ShiroProvider</name>
>>             <enabled>true</enabled>
>>             <param>
>>                 <name>sessionTimeout</name>
>>                 <value>30</value>
>>             </param>
>>             <param>
>>                 <name>main.ldapRealm</name>
>>                 <value>org.apache.hadoop.gatew
>> ay.shirorealm.KnoxLdapRealm</value>
>>             </param>
>>             <param>
>>                 <name>main.ldapContextFactory</name>
>>                 <value>org.apache.hadoop.gatew
>> ay.shirorealm.KnoxLdapContextFactory</value>
>>             </param>
>>             <param>
>>                 <name>main.ldapRealm.contextFactory</name>
>>                 <value>$ldapContextFactory</value>
>>             </param>
>>             <param>
>>                 <name>main.ldapRealm.userDnTemplate</name>
>>                 <value>uid={0},cn=users,cn=accounts,dc=<cluster></value>
>>             </param>
>>             <param>
>>                 <name>main.ldapRealm.contextFactory.url</name>
>>                 <value>ldap://<freeipa_node>:389</value>
>>             </param>
>>             <param>
>>                 <name>main.ldapRealm.contextFa
>> ctory.authenticationMechanism</name>
>>                 <value>simple</value>
>>             </param>
>>             <param>
>>                 <name>urls./**</name>
>>                 <value>authcBasic</value>
>>             </param>
>>         </provider>
>>         <provider>
>>             <role>authorization</role>
>>             <name>AclsAuthz</name>
>>             <enabled>true</enabled>
>>             <param>
>>                 <name>knox.acl</name>
>>                 <value>admin;*;*</value>
>>             </param>
>>         </provider>
>>         <provider>
>>             <role>identity-assertion</role>
>>             <name>Default</name>
>>             <enabled>true</enabled>
>>         </provider>
>>         <provider>
>>             <role>hostmap</role>
>>             <name>static</name>
>>             <enabled>false</enabled>
>>             <param><name>localhost</name><value>sandbox,sandbox.hortonwo
>> rks.com</value></param>
>>         </provider>
>>     </gateway>
>>
>>     <service>
>>         <role>WEBHDFS</role>
>>         <url>http://<namenode>:50070/webhdfs</url>
>>     </service>
>>
>>     <service>
>>         <role>SOLRAPI</role>
>>         <url>http://<solrnode>:6083/solr</url>
>>     </service>
>> </topology>
>>
>>
>

RE: Encoding/escaping whitespace in WebHDFS requests

Posted by "Willmer, Alex (UK Defence)" <al...@cgi.com>.
Hi Larry,

The same file does work directly from WebHDFS (see below). Looking more closely at the logs I sent previously, it looks like Knox (or something in the chain I'm unaware of) is decoding the %20 encoded spaces, then reencoding them as + encoded, i.e.

17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request method: GET
..
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response status: 404

With thanks, Alex


Direct WebHDFS request (hostnames redacted)

# curl -si -u: "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" --negotiate -L | head -n40
HTTP/1.1 401 Authentication required
Cache-Control: must-revalidate,no-cache,no-store
Date: Wed, 24 May 2017 19:01:41 GMT
Pragma: no-cache
Date: Wed, 24 May 2017 19:01:41 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Path=/; HttpOnly
Content-Type: text/html; charset=iso-8859-1
Content-Length: 1533
Server: Jetty(6.1.26.hwx)

HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Wed, 24 May 2017 19:01:42 GMT
Date: Wed, 24 May 2017 19:01:42 GMT
Pragma: no-cache
Expires: Wed, 24 May 2017 19:01:42 GMT
Date: Wed, 24 May 2017 19:01:42 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAgEFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxMzW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E="; Path=/; HttpOnly
Content-Type: application/octet-stream
Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
Content-Length: 0
Server: Jetty(6.1.26.hwx)

HTTP/1.1 200 OK
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
Content-Type: application/octet-stream
Connection: close
Content-Length: 13365618

%����1.6
<</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
...


________________________________
From: larry mccay [lmccay@apache.org]
Sent: 24 May 2017 18:05
To: user@knox.apache.org
Subject: Re: Encoding/escaping whitespace in WebHDFS requests

Hi Alex -

I notice from the audit log that the 404 is actually coming from WebHDFS not from Knox.
Can you confirm that direct access to WebHDFS without going through Knox works with the same URL?

thanks,

--larry

On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <al...@cgi.com>> wrote:
How should I encode spaces characters in the URL when I make a request to WebHDFS through Knox? Or should be enabling/configuring  something in Knox to handle them?

I'm making the following (redacted values in <>) request to WebHDFS, through Knox

curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" \
     -<username>:<password> -k -s

However Knox is returning HTTP 404 with the following body (whitespace/formatting added by me)

{"exception":"FileNotFoundException",
 "javaClassName":"java.io<http://java.io>.FileNotFoundException",
 "message":"File /docs/filename+with+spaces.pdf not found."}}

I've tried encoding the spaces as + (same result), and not encoding them (HTTP 400  Unknown Version).
If I request a file for which the path does not contain spaces then it works.

Any ideas?

With thanks, Alex



PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is enabled in the cluster.

The (redacted) response headers for the %20 encoded request

< HTTP/1.1 404 Not Found
< Date: Wed, 24 May 2017 15:34:26 GMT
< Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk48y9yjh;Path=/gateway/<cluster>;Secure;HttpOnly
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0; Expires=Tue, 23-May-2017 15:34:26 GMT
< Cache-Control: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< Expires: Wed, 24 May 2017 15:34:26 GMT
< Date: Wed, 24 May 2017 15:34:26 GMT
< Pragma: no-cache
< X-FRAME-OPTIONS: SAMEORIGIN
< Content-Type: application/json; charset=UTF-8
< Server: Jetty(6.1.26.hwx)
< Content-Length: 252

The (redacted) Knox logs for the %20 encoded request

==> /var/log/hadoop/knox/gateway-audit.log <==
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|unavailable|Request method: GET
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|Groups: []
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|unavailable|Request method: GET
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response status: 404
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with spaces.pdf?op=OPEN|success|Response status: 404

==> /var/log/hadoop/knox/gateway.log <==
2017-05-24 15:51:05,254 INFO  hadoop.gateway (KnoxLdapRealm.java:getUserDn(691)) - Computed userDn: uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for principal: <username>
2017-05-24 15:51:05,259 INFO  hadoop.gateway (AclsAuthorizationFilter.java:doFilter(85)) - Access Granted: true

The (redacted) topology

<topology>
    <gateway>
        <provider>
            <role>authentication</role>
            <name>ShiroProvider</name>
            <enabled>true</enabled>
            <param>
                <name>sessionTimeout</name>
                <value>30</value>
            </param>
            <param>
                <name>main.ldapRealm</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
            </param>
            <param>
                <name>main.ldapContextFactory</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory</name>
                <value>$ldapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.userDnTemplate</name>
                <value>uid={0},cn=users,cn=accounts,dc=<cluster></value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.url</name>
                <value>ldap://<freeipa_node>:389</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                <value>simple</value>
            </param>
            <param>
                <name>urls./**</name>
                <value>authcBasic</value>
            </param>
        </provider>
        <provider>
            <role>authorization</role>
            <name>AclsAuthz</name>
            <enabled>true</enabled>
            <param>
                <name>knox.acl</name>
                <value>admin;*;*</value>
            </param>
        </provider>
        <provider>
            <role>identity-assertion</role>
            <name>Default</name>
            <enabled>true</enabled>
        </provider>
        <provider>
            <role>hostmap</role>
            <name>static</name>
            <enabled>false</enabled>
            <param><name>localhost</name><value>sandbox,sandbox.hortonworks.com<http://sandbox.hortonworks.com></value></param>
        </provider>
    </gateway>

    <service>
        <role>WEBHDFS</role>
        <url>http://<namenode>:50070/webhdfs</url>
    </service>

    <service>
        <role>SOLRAPI</role>
        <url>http://<solrnode>:6083/solr</url>
    </service>
</topology>



Re: Encoding/escaping whitespace in WebHDFS requests

Posted by larry mccay <lm...@apache.org>.
Hi Alex -

I notice from the audit log that the 404 is actually coming from WebHDFS
not from Knox.
Can you confirm that direct access to WebHDFS without going through Knox
works with the same URL?

thanks,

--larry

On Wed, May 24, 2017 at 12:32 PM, Willmer, Alex (UK Defence) <
alex.willmer@cgi.com> wrote:

> How should I encode spaces characters in the URL when I make a request to
> WebHDFS through Knox? Or should be enabling/configuring  something in Knox
> to handle them?
>
> I'm making the following (redacted values in <>) request to WebHDFS,
> through Knox
>
> curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/
> docs/filename%20with%20spaces.pdf?op=OPEN" \
>      -<username>:<password> -k -s
>
> However Knox is returning HTTP 404 with the following body
> (whitespace/formatting added by me)
>
> {"exception":"FileNotFoundException",
>  "javaClassName":"java.io.FileNotFoundException",
>  "message":"File /docs/filename+with+spaces.pdf not found."}}
>
> I've tried encoding the spaces as + (same result), and not encoding them
> (HTTP 400  Unknown Version).
> If I request a file for which the path does not contain spaces then it
> works.
>
> Any ideas?
>
> With thanks, Alex
>
>
>
> PS In anticipation of queries: I'm using Knox 0.11.0 with OpenJDK
> 1.8.0_131 on CentOS 7, with an HDP 2.6 (Hadoop 2.7.x) cluster. Kerberos is
> enabled in the cluster.
>
> The (redacted) response headers for the %20 encoded request
>
> < HTTP/1.1 404 Not Found
> < Date: Wed, 24 May 2017 15:34:26 GMT
> < Set-Cookie: JSESSIONID=15acwo8gt9qr8gdbvk48y9yjh;
> Path=/gateway/<cluster>;Secure;HttpOnly
> < Expires: Thu, 01 Jan 1970 00:00:00 GMT
> < Set-Cookie: rememberMe=deleteMe; Path=/gateway/cysafa; Max-Age=0;
> Expires=Tue, 23-May-2017 15:34:26 GMT
> < Cache-Control: no-cache
> < Expires: Wed, 24 May 2017 15:34:26 GMT
> < Date: Wed, 24 May 2017 15:34:26 GMT
> < Pragma: no-cache
> < Expires: Wed, 24 May 2017 15:34:26 GMT
> < Date: Wed, 24 May 2017 15:34:26 GMT
> < Pragma: no-cache
> < X-FRAME-OPTIONS: SAMEORIGIN
> < Content-Type: application/json; charset=UTF-8
> < Server: Jetty(6.1.26.hwx)
> < Content-Length: 252
>
> The (redacted) Knox logs for the %20 encoded request
>
> ==> /var/log/hadoop/knox/gateway-audit.log <==
> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||
> access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with
> spaces.pdf?op=OPEN|unavailable|Request method: GET
> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<
> username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
> with spaces.pdf?op=OPEN|success|
> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<
> username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
> with spaces.pdf?op=OPEN|success|Groups: []
> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<
> username>|||authorization|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
> with spaces.pdf?op=OPEN|success|
> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<
> username>|||dispatch|uri|http://<namenode>.<cluster>:50070/
> webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|unavailable|Request
> method: GET
> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<
> username>|||dispatch|uri|http://<namenode>.<cluster>:50070/
> webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response
> status: 404
> 17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<
> username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename with
> spaces.pdf?op=OPEN|success|Response status: 404
>
> ==> /var/log/hadoop/knox/gateway.log <==
> 2017-05-24 15:51:05,254 INFO  hadoop.gateway (KnoxLdapRealm.java:getUserDn(691))
> - Computed userDn: uid=<username>,cn=users,cn=accounts,dc=<cluster> using
> dnTemplate for principal: <username>
> 2017-05-24 15:51:05,259 INFO  hadoop.gateway (AclsAuthorizationFilter.java:doFilter(85))
> - Access Granted: true
>
> The (redacted) topology
>
> <topology>
>     <gateway>
>         <provider>
>             <role>authentication</role>
>             <name>ShiroProvider</name>
>             <enabled>true</enabled>
>             <param>
>                 <name>sessionTimeout</name>
>                 <value>30</value>
>             </param>
>             <param>
>                 <name>main.ldapRealm</name>
>                 <value>org.apache.hadoop.gateway.shirorealm.
> KnoxLdapRealm</value>
>             </param>
>             <param>
>                 <name>main.ldapContextFactory</name>
>                 <value>org.apache.hadoop.gateway.shirorealm.
> KnoxLdapContextFactory</value>
>             </param>
>             <param>
>                 <name>main.ldapRealm.contextFactory</name>
>                 <value>$ldapContextFactory</value>
>             </param>
>             <param>
>                 <name>main.ldapRealm.userDnTemplate</name>
>                 <value>uid={0},cn=users,cn=accounts,dc=<cluster></value>
>             </param>
>             <param>
>                 <name>main.ldapRealm.contextFactory.url</name>
>                 <value>ldap://<freeipa_node>:389</value>
>             </param>
>             <param>
>                 <name>main.ldapRealm.contextFactory.
> authenticationMechanism</name>
>                 <value>simple</value>
>             </param>
>             <param>
>                 <name>urls./**</name>
>                 <value>authcBasic</value>
>             </param>
>         </provider>
>         <provider>
>             <role>authorization</role>
>             <name>AclsAuthz</name>
>             <enabled>true</enabled>
>             <param>
>                 <name>knox.acl</name>
>                 <value>admin;*;*</value>
>             </param>
>         </provider>
>         <provider>
>             <role>identity-assertion</role>
>             <name>Default</name>
>             <enabled>true</enabled>
>         </provider>
>         <provider>
>             <role>hostmap</role>
>             <name>static</name>
>             <enabled>false</enabled>
>             <param><name>localhost</name><value>sandbox,sandbox.
> hortonworks.com</value></param>
>         </provider>
>     </gateway>
>
>     <service>
>         <role>WEBHDFS</role>
>         <url>http://<namenode>:50070/webhdfs</url>
>     </service>
>
>     <service>
>         <role>SOLRAPI</role>
>         <url>http://<solrnode>:6083/solr</url>
>     </service>
> </topology>
>
>