You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by "Alex Willmer (JIRA)" <ji...@apache.org> on 2017/05/25 12:28:04 UTC

[jira] [Created] (KNOX-949) WeBHDFS proxy replaces %20 encoded spaces in URL with + encoding

Alex Willmer created KNOX-949:
---------------------------------

             Summary: WeBHDFS proxy replaces %20 encoded spaces in URL with + encoding
                 Key: KNOX-949
                 URL: https://issues.apache.org/jira/browse/KNOX-949
             Project: Apache Knox
          Issue Type: Bug
    Affects Versions: 0.11.0
            Reporter: Alex Willmer


If a file with spaces in the name (e.g. 'foo bar.txt') is requested from HDFS, through WebHDFS and Knox - then Knox rewrites the %20 encoding in the client request with + encoding. This results in an HTTP 404 being returned by WebHDFS, and hence by Knox. Requesting the same file directly from WbHDFS works. Example

curl "https://<hostname>:18443/gateway/<cluster>/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN" \
     -<username>:<password> -k -s

Knox response

{"exception":"FileNotFoundException",
 "javaClassName":"java.io.FileNotFoundException",
 "message":"File /docs/filename+with+spaces.pdf not found."}}

Knox logs

==> /var/log/hadoop/knox/gateway-audit.log <==
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS||||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
with spaces.pdf?op=OPEN|unavailable|Request method: GET
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authentication|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
with spaces.pdf?op=OPEN|success|Groups: []
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||authorization|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
with spaces.pdf?op=OPEN|success|
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|unavailable|Request
method: GET
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||dispatch|uri|http://<namenode>.<cluster>:50070/webhdfs/v1/docs/filename+with+spaces.pdf?op=OPEN&doAs=<username>|success|Response
status: 404
17/05/24 15:51:05 ||88ce58ea-d7c5-46cd-a87a-c2f96b38130e|audit|WEBHDFS|<username>|||access|uri|/gateway/<cluster>/webhdfs/v1/docs/filename
with spaces.pdf?op=OPEN|success|Response status: 404

==> /var/log/hadoop/knox/gateway.log <==
2017-05-24 15:51:05,254 INFO  hadoop.gateway (KnoxLdapRealm.java:getUserDn(691)) - Computed
userDn: uid=<username>,cn=users,cn=accounts,dc=<cluster> using dnTemplate for
principal: <username>
2017-05-24 15:51:05,259 INFO  hadoop.gateway (AclsAuthorizationFilter.java:doFilter(85)) -
Access Granted: true

Direct WebHDFS request for the same file

# curl -si -u: "http://<namenode>:50070/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN"
--negotiate -L | head -n40
HTTP/1.1 401 Authentication required
Cache-Control: must-revalidate,no-cache,no-store
Date: Wed, 24 May 2017 19:01:41 GMT
Pragma: no-cache
Date: Wed, 24 May 2017 19:01:41 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Path=/; HttpOnly
Content-Type: text/html; charset=iso-8859-1
Content-Length: 1533
Server: Jetty(6.1.26.hwx)

HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Wed, 24 May 2017 19:01:42 GMT
Date: Wed, 24 May 2017 19:01:42 GMT
Pragma: no-cache
Expires: Wed, 24 May 2017 19:01:42 GMT
Date: Wed, 24 May 2017 19:01:42 GMT
Pragma: no-cache
X-FRAME-OPTIONS: SAMEORIGIN
WWW-Authenticate: Negotiate YGkGCSqGSIb3EgECAgIAb1owWKADAgEFoQMCAQ+iTDBKoAMCARKiQwRBQM/auuLcl2xey6wMp6EjCPJFSqK3snscxMzW7RvfgxOo7182GzD5N9jf+OWGr+tjpvlRX0c/7iTBfYKSetf4ekU=
Set-Cookie: hadoop.auth="u=admin&p=admin@CYSAFA&t=kerberos&e=1495688502002&s=b7p35TgaxItAUTkKJuSXuynoq9E=";
Path=/; HttpOnly
Content-Type: application/octet-stream
Location: http://<datanode3>:1022/webhdfs/v1/docs/filename%20with%20spaces.pdf?op=OPEN&delegation=HgAFYWRtaW4FYWRtaW4AigFcO9YJ8ooBXF_ijfJFAxSBYFUnsXY3up11ZNIi4hIi__5RvRJXRUJIREZTIGRlbGVnYXRpb24PMTcyLjE4LjAuOTo4MDIw&namenoderpcaddress=<namenode>:8020&offset=0
Content-Length: 0
Server: Jetty(6.1.26.hwx)

HTTP/1.1 200 OK
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
Content-Type: application/octet-stream
Connection: close
Content-Length: 13365618

%����1.6
<</Filter/FlateDecode/First 157/Length 5350/N 16/Type/ObjStm>>stream
...

See also

 - http://mail-archives.apache.org/mod_mbox/knox-user/201705.mbox/%3C335C4DD06CF6C24EAA7A73F44D43D7CB4E6EB300%40SE-EX021.groupinfra.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)