You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2009/05/06 23:21:30 UTC

[jira] Commented: (HADOOP-4707) Improvements to Hadoop Thrift bindings

    [ https://issues.apache.org/jira/browse/HADOOP-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706595#action_12706595 ] 

Todd Lipcon commented on HADOOP-4707:
-------------------------------------

Made some more iterations on this. As this is growing somewhat large (especially with the diff including all the generated code), I've also put a branch up on github for those wanting to follow along in a more easily parseable fashion:

http://github.com/toddlipcon/hadoop/commits/hadoop-4707

I'll also upload an up to date patch momentarily.

Summary of changes since last upload:
*      Avoid NPE when getBlocks is called with offset past end of file - instead, return no blocks
*      Make stat throw FileNotFoundException for bad paths
*      Add a clazz field to IOException thrift type for the specific subclass.
*      Add thriftfs to contrib test target
*      Refactor thrift contrib into more classes to clean things up a bit
*      HADOOP-4707: Add RequestContext parameter to all RPCs and extract UGI from them
*      HADOOP-4707: Add a basic DFS Health Report call
*      HADOOP-4707: Make services inherit from a base service that provides VersionInfo
*      HADOOP-4707: Expose JVM heap information and a stack trace in base
*      HADOOP-4707: Add access to Metrics in base service

I'd also like to propose moving this code into contrib/thrift rather than contrib/thriftfs. The motivation for this is that I will soon be working on JobTracker and TaskTracker thrift plugins, and there's a lot of common code I want to share. Putting that code in a module called "thriftfs" seems incorrect. Additionally, contrib/thriftfs currently is the home for two entirely separate projects - this, and the earlier "proxy style" thrift wrapper of the DFS client. Since these two projects share no code, it makes no sense for them to be in the same contrib dir.

If no one has any objections, I'll do this move in the next day or two.

> Improvements to Hadoop Thrift bindings
> --------------------------------------
>
>                 Key: HADOOP-4707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4707
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/thiftfs
>    Affects Versions: 0.20.0
>         Environment: Tested under Linux x86-64
>            Reporter: Carlos Valiente
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: all.diff, BlockManager.java, build_xml.diff, DefaultBlockManager.java, DFSBlockManager.java, gen.diff, HADOOP-4707-55c046a.txt, hadoop-4707-6bc958.txt, HADOOP-4707.diff, HADOOP-4707.patch, HADOOP-4707.patch, hadoopfs_thrift.diff, hadoopthriftapi.jar, HadoopThriftServer.java, HadoopThriftServer_java.diff, hdfs.py, hdfs_py_venky.diff, libthrift.jar, libthrift.jar, libthrift.jar
>
>
> I have made the following changes to hadoopfs.thrift:
> #  Added namespaces for Python, Perl and C++.
> # Renamed parameters and struct members to camelCase versions to keep them consistent (in particular FileStatus{blockReplication,blockSize} vs FileStatus.{block_replication,blocksize}).
> # Renamed ThriftHadoopFileSystem to FileSystem. From the perspective of a Perl/Python/C++ user, 1) it is already clear that we're using Thrift, and 2) the fact that we're dealing with Hadoop is already explicit in the namespace.  The usage of generated code is more compact and (in my opinion) clearer:
> {quote}
>         *Perl*:
>         use HadoopFS;
>         my $client = HadoopFS::FileSystemClient->new(..);
>          _instead of:_
>         my $client = HadoopFS::ThriftHadoopFileSystemClient->new(..);
>         *Python*:
>         from hadoopfs import FileSystem
>         client = FileSystem.Client(..)
>         _instead of_
>         from hadoopfs import ThriftHadoopFileSystem
>         client = ThriftHadoopFileSystem.Client(..)
>         (See also the attached diff [^scripts_hdfs_py.diff] for the
>          new version of 'scripts/hdfs.py').
>         *C++*:
>         hadoopfs::FileSystemClient client(..);
>          _instead of_:
>         hadoopfs::ThriftHadoopFileSystemClient client(..);
> {quote}
> # Renamed ThriftHandle to FileHandle: As in 3, it is clear that we're dealing with a Thrift object, and its purpose (to act as a handle for file operations) is clearer.
> # Renamed ThriftIOException to IOException, to keep it simpler, and consistent with MalformedInputException.
> # Added explicit version tags to fields of ThriftHandle/FileHandle, Pathname, MalformedInputException and ThriftIOException/IOException, to improve compatibility of existing clients with future versions of the interface which might add new fields to those objects (like stack traces for the exception types, for instance).
> Those changes are reflected in the attachment [^hadoopfs_thrift.diff].
> Changes in generated Java, Python, Perl and C++ code are also attached in [^gen.diff]. They were generated by a Thrift checkout from trunk
> ([http://svn.apache.org/repos/asf/incubator/thrift/trunk/]) as of revision
> 719697, plus the following Perl-related patches:
> * [https://issues.apache.org/jira/browse/THRIFT-190]
> * [https://issues.apache.org/jira/browse/THRIFT-193]
> * [https://issues.apache.org/jira/browse/THRIFT-199]
> The Thrift jar file [^libthrift.jar] built from that Thrift checkout is also attached, since it's needed to run the Java Thrift server.
> I have also added a new target to src/contrib/thriftfs/build.xml to build the Java bindings needed for org.apache.hadoop.thriftfs.HadoopThriftServer.java (see attachment [^build_xml.diff] and modified HadoopThriftServer.java to make use of the new bindings (see attachment [^HadoopThriftServer_java.diff]).
> The jar file [^lib/hadoopthriftapi.jar] is also included, although it can be regenerated from the stuff under 'gen-java' and the new 'compile-gen' Ant target.
> The whole changeset is also included as [^all.diff].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.