You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by "Maksim Kononenko (JIRA)" <ji...@apache.org> on 2013/07/19 11:30:49 UTC
[jira] [Commented] (KNOX-22) Invoke HDFS via gateway using hadoop
CLI and FileSystem API
[ https://issues.apache.org/jira/browse/KNOX-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713488#comment-13713488 ]
Maksim Kononenko commented on KNOX-22:
--------------------------------------
Here is a table that I found which shows HDFS APIs.
File System Comm. Method Scheme / Prefix / Port Read / Write Cross Version
HDFS RPC hdfs://...:8020 Read / Write Same HDFS version only
HFTP HTTP hftp://...:50070 Read only Version independent
WebHDFS HTTP (REST) webhdfs://...:50070 Read / Write Version independent
1. HDFS.
It is designed to use Sockets as transport mechanism and has "pluggable" marshalling protocol support.
For marshalling it implements two protocols:
- Google Protocol Buffers;
- Raw bytes read/write:
Code extract:
public void write(DataOutput out) throws IOException {
out.writeLong(rpcVersion);
UTF8.writeString(out, declaringClassProtocolName);
UTF8.writeString(out, methodName);
out.writeLong(clientVersion);
out.writeInt(clientMethodsHash);
out.writeInt(parameterClasses.length);
for (int i = 0; i < parameterClasses.length; i++) {
ObjectWritable.writeObject(out, parameters[i], parameterClasses[i],
conf, true);
}
}
But Google Protocol Buffers usage is hardcoded - I didn't find any mechanism for switching between these marshalling protocols.
This API requires the same client/server version.
2. HFTP.
Works based on simple HTTP.
Here is an example of URL being generated for "LS" command:
http://host:50070/listPaths/?ugi=root,root
3. WebHDFS.
Works based on HTTP (REST).
I tried to configure hadoop CLI to work through the gateway and my attempt has failed.
I found following in the code for webhdfs FileSystem API:
URL for connection is formed as
"http" + nnAddr.getHostName() + ":" + nnAddr.getPort() + "/webhdfs/v1/" + path + '?' + query
Strings in quotes are hardcoded so we can't change schema and context path.
As authentication mechanism, Hadoop CLI supports just Kerberos.
> Invoke HDFS via gateway using hadoop CLI and FileSystem API
> -----------------------------------------------------------
>
> Key: KNOX-22
> URL: https://issues.apache.org/jira/browse/KNOX-22
> Project: Apache Knox
> Issue Type: New Feature
> Components: ClientDSL
> Affects Versions: 0.2.0
> Reporter: Kevin Minder
> Assignee: Maksim Kononenko
>
> From BUG-4301
> It should be possible to use the existing HDFS clients to access HDFS via the gateway. These existing clients are the hadoop cli and the FileSystem Java API.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira