You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by "Maksim Kononenko (JIRA)" <ji...@apache.org> on 2013/07/19 11:30:49 UTC

[jira] [Commented] (KNOX-22) Invoke HDFS via gateway using hadoop CLI and FileSystem API

    [ https://issues.apache.org/jira/browse/KNOX-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713488#comment-13713488 ] 

Maksim Kononenko commented on KNOX-22:
--------------------------------------

Here is a table that I found which shows HDFS APIs.

File System	Comm. Method	       Scheme / Prefix / Port	         Read / Write	  Cross Version
HDFS	        RPC	                       hdfs://...:8020	                 Read / Write 	  Same HDFS version only
HFTP	        HTTP	               hftp://...:50070	                 Read only 	  Version independent
WebHDFS	        HTTP (REST)	       webhdfs://...:50070	         Read / Write	  Version independent

1. HDFS.
It is designed to use Sockets as transport mechanism and has "pluggable" marshalling protocol support.
For marshalling it implements two protocols:
- Google Protocol Buffers;
- Raw bytes read/write:
Code extract:
public void write(DataOutput out) throws IOException {
      out.writeLong(rpcVersion);
      UTF8.writeString(out, declaringClassProtocolName);
      UTF8.writeString(out, methodName);
      out.writeLong(clientVersion);
      out.writeInt(clientMethodsHash);
      out.writeInt(parameterClasses.length);
      for (int i = 0; i < parameterClasses.length; i++) {
        ObjectWritable.writeObject(out, parameters[i], parameterClasses[i],
                                   conf, true);
      }
    }

But Google Protocol Buffers usage is hardcoded - I didn't find any mechanism for switching between these marshalling protocols.

This API requires the same client/server version.

2. HFTP.
Works based on simple HTTP.
Here is an example of URL being generated for "LS" command:
http://host:50070/listPaths/?ugi=root,root

3. WebHDFS.
Works based on HTTP (REST).

I tried to configure hadoop CLI to work through the gateway and my attempt has failed.
I found following in the code for webhdfs FileSystem API:
URL for connection is formed as
"http" + nnAddr.getHostName() + ":" + nnAddr.getPort() + "/webhdfs/v1/" + path + '?' + query
Strings in quotes are hardcoded so we can't change schema and context path.


As authentication mechanism, Hadoop CLI supports just Kerberos.
                
> Invoke HDFS via gateway using hadoop CLI and FileSystem API
> -----------------------------------------------------------
>
>                 Key: KNOX-22
>                 URL: https://issues.apache.org/jira/browse/KNOX-22
>             Project: Apache Knox
>          Issue Type: New Feature
>          Components: ClientDSL
>    Affects Versions: 0.2.0
>            Reporter: Kevin Minder
>            Assignee: Maksim Kononenko
>
> From BUG-4301
> It should be possible to use the existing HDFS clients to access HDFS via the gateway.  These existing clients are the hadoop cli and the FileSystem Java API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira